Compressing pictures (LONG)

Thu Aug 30 18:00:46 AEST 1990

In article <sean.651965645 at s.ms.uky.edu> sean at ms.uky.edu (Sean Casey) writes:
>Doesn't GIF allow a "plug-in" compressor? Surely it doesn't _require_ one
>to use LZW... does it?

   Yes, it *does* require you to use LZW, with no provision that I know
of for other compressions, at least in the GIF87a spec (GIF90 or whatever
it was that was just released may be different, but I doubt it, and
besides, if we don't support the old GIF decoders we may as well go
with a completely new format.) This is one of my gripes with the
GIF format, and one reason I liked the .PKX format better (it supported
pixel aspect ratios too =8) ) except that it died an early and local death...

   Anyway ... I'd like to add my own thoughts to this discussion. I've
been thinking about something like this for some time, though I haven't
done any actual experimentation yet. A couple of points however ... (Keep
in mind that everything in this post should be prefaced with "I think", 
"IMHO", or "What is everyone else's opinion on"; but as a rudimentary
form of article compression I have collapsed all of those notices into
this one.)

   * Taking differences is probably much, much better than XORing, 
since the point is to take advantage of the vertical continuity of
the image. The problem is that it doubles the number of values the
compressor will have to deal with -- from max->min to min->max, which
is 2*(max-min) values. The obvious solution is to wrap around, ie, 
use modulo(max-min) arithmetic, except that this overlays some of the
least frequently used values (the extremes) on the most frequently used
ones (small values), making the huffman/arithmatic coder less
efficient. I don't know whether this outweighs the advantage gained
from the smaller number of values though. I guess that would be
determined by experiment. Or, since it is a fairly trivial-to-code
change, it could be determined individually for each image,
based on that image's characteristics. This would only take
extra effort in the compression stage, not decompression, which is
what we want.

   * Arithmetic coding (or Huffman) is probably better than Lempel-Ziv.
LZW compresses common strings in the data, which will be rare in
images (such as ones with large numbers of colors or grayscales) which 
have much `noise'. Huffman or arithmetic coding compresses frequently-
ocurring single sample values, which should be nicely grouped already
because of the differencing in the previous step. 
   For those less familiar with arithmetic coding: Arithmetic coding and 
Huffman coding are very similar. Both attempt to compress the data by
representing more common values with shorter bitstrings. However,
arithmetic coding has a number of advantages over Huffman; see below.

   * Pixel data are not the entire image; obviously, any format
needs some sort of header, if only to include a magic number and image
dimensions. Palette, title, pixel aspect ratio, output device of
preference, etc., may also be useful. However, all of these
should have some value which indicates "I don't know" to the
decoder; pictures converted from .GIF wouldn't have a wellknown
aspect ratio, for instance. The decoder could use the most convenient
ratio or attempt to guess from the image's dimansions and its
knowledge of popular scanners and displays. Or even (gasp!) ask
the user. =8)

   * Since this format is intended to be used on the Usenet, which
is not exactly a transparent medium (8 bit or otherwise...) it should
probably have some uuencode-like format. Ideally this would be done similar
to PBMPLUS' "packed" vs. "normal" formats, except "packed" would
be the norm for storage and transmission on 8-bit-clean paths
(like FTP). A different "magic number" in the header could let the
decoder differentiate. It would be a good idea to take some hints from
xxdecode and ABE (if that's the right program; I haven't really
looked at either) as to error tolerance and detection. One of
the problems in a.s.p. seems to be that signatures and the like in
multi-part postings get interpreted by uudecode, inserting spurious
data in the decoded file, which throws the LZW off; maybe
the encoder could directly support splitting into parts (this
gets more complex with every sentence ... =8) ). Arithmetic
encoding would be able to handle this fairly easily by setting the
output character set to some carefully selected subset of ASCII.

   * Arithmetic coding has been proved by someone to be "at least as
good as" Huffman, and in many cases better, because output bitstreams
don't have to be an integer number of bits. Also, adaptive arithmetic
seems to be a lot easier to implement than adaptive Huffman, since
you can just readjust the intervals. However, since the input data
has already been somewhat processed, it might not be necessary to
use adaptive compressing (although it would make the postings
smaller, it would make the compressor bulkier, slower and more bug-prone...)
Again, something for experiment to determine.

  ... looks like this has expanded into more than "a couple
of ideas" ... comments, anyone? opinions? flames? I think I'll
try experimenting with this; does anyone have a source of
good large-color-palette images (I'd use a.s.p, but the other 
people in the lab would look askance =8) )? 

-- 
wiml at blake.acs.washington.edu       Seattle, Washington  | No sig under
(William Lewis)  |  47 41' 15" N   122 42' 58" W  |||||||| construction