USENIX Board Studies UUCP: Compression

Tom Limoncelli limonce at pilot.njin.net
Fri Dec 1 16:15:12 AEST 1989


I see two major ways to compress the news batches even more.

1 -- First, don't send messages that have already arrived at the site.
I know the algorithm used now is great, but it is not near-optimal.
How about a "I have xxxx" / "Send me yyy" negotiation?  The two
machines could hangup, do the batching/compressing/fooing/baring, then
dial up again and transmit the batches.
(Maybe this implementation isn't optimal, but I think some kind of
negotiation needs to be worked up.)

2 -- A new compression scheme.  We all know that the current one is
quite good, but I have one addition.  I can compress the following
four lines (not including blanks):

In article <93061 at pyramid.pyramid.com> romain at pyramid.pyramid.com (Romain Kang) writes:
> Clearly, the Telebit 'g' spoof works well for us now.  Philosophically,
> though, it bothers me that we have an OS as vendor-independent as UNIX,
> yet we are so dependent on Telebit.  Ideally, other equipment should be

as a string like "<93061 at pyramid.pyramid.com>1:3".  This would mean
something like "message '<93061 at pyramid.pyramid.com' lines 1 thru 3".
This is 30 bytes instead of about 300 bytes.  This requires (1) the
other site has that message already. (2) the user did only trivial
editing of the old post.

(1) can be solved by intelligent software.  If you're sending the
other message in the batch, you know the other site will have that message.
You could also get smarter with the ihave/sendme negotiation.

(2) can be solved by letting users know that it's "nice" to do "clean"
edits.

Some interesting notes:  The header ("In article...") can be
completely generated from by the destination site.
Usenet software would reach a hypertext state.  Software could store
the messages in a similar format, saving disk space.  The user could
(point|cursor|select) a quote and the software could pop up the entire
article.

Some interesting caveats:  The example had 10:1 compression though
other cases will be quite worse (and others will be quite better).
Even then, not too much of an article is "quoted" material.  Or is it?
I wonder if anyone keeps statistics on such a thing.  Yes, users are
supposed to "summarize when quoting", but they don't; and this would
make the point moot.  Is this good or bad?  Go figure.

What do you think?

-Tom
-- 
Tom Limoncelli -- limonce at pilot.njin.net -- tlimonce at drunivac.bitnet
rutgers!njin!tlimonce -- Drew University, Madison, NJ -- 201-408-5389
"All's well that ends well... if your a functional rationalist."



More information about the Comp.org.usenix mailing list