Archive-name

Edward Vielmetti emv at math.lsa.umich.edu
Wed Aug 22 12:39:22 AEST 1990


In article <193 at n4hgf.Mt-Park.GA.US> wht at n4hgf.Mt-Park.GA.US (Warren Tucker) writes:

   Arkive-Nombre: diatribe-blabber/part01

   >Also, how are users supposed to know what's a good name to put in
   >the Archive-name header?

   It is Very Handy when you are looking for a program named 'foo,'
   say, and you do not know that it was posted in Volume 4,
   Issues 12-14, patched months later in Volume 6, Issue 5 and
   patched again months later in Volume 7, Issue 10.  Instead, you
   just need look up 'foo' to find:

   foo/part01 foo/part02 foo/part03 foo/patch01 foo/patch02

Well...there's a problem here, one which I understand librarians refer
to as "authority control".  Say you are looking for a program named 
"shar", which I understand is a very popular name for people to 
give to their programs.  You think that your program is the One
True Shar, but other people differ.  The alt.sources archivist(s) have
to make that decision, one way or another.   One reasonable solution
that has been used in the Gnu Emacs Lisp library collection is to 
prefix the name of the package with the author's name, so it
would be
   wht-foo/part01 wht-foo/part02 wht-foo/part03 ...
to disambiguate between authors.  If a separate posting comes around
with header information, it might even be sensible to override the
author's ill-advised Archive-name choice with a better one.  Even worse,
you might forget or not have easy access to all of the various Archive-name
headers that people have used throughout the course of the group, and
thus give yourself the opportunity for accidental collisions.

Another substantial problem with alt.sources is version control.  The
system is explicitly designed (hm, seems to have worked out to be) to
let people post multiple revisions of a package in quick succession.  Not
all authors are equally conscientious about keeping version information
around.  My hack for this for comp.archives is to use the date as the
version string, so for one-part stuff it might look like
   wht-foo/21-Aug-90
which is OK unless you get two in the same day or a multipart posting in.

Alt.sources gets a fair amount of stuff, and it's pretty diverse;
comp.archives even more so.  As a result a naive application of 
Archive-name: as a file name to store the article in is going to
break down as soon as your directory starts to fill up with 100's
of entries, or 1000s even.  So you need to split the archive into
volumes, either one a year, quarter, or month depending on traffic
So these files would be kept in e.g.
   /usenet/alt.sources/vol.90.3Q/wht-foo/21-Aug-90.Z
which still lets you grep on foo or do 
   ls /usenet/alt.sources/*/*foo*
to find things.

I don't know that Archive-Name is the be all and end all of things.
Certainly if you could extract the README and other internal
documentation and make it accessable for a full-text search, you'd
enable even more arbitrary and complex searches.  Similarly, if the
author information were easily visible, you could search for things
like "all programs written by wht"; that's not easy to do in any of
the indexes of usenet archives that I'm aware of.  Archive-name does
have the very nice property of being a sensible way to store postings,
one file per article, with reasonable grouping and meaningful file
names.  And popular archiving software understands it.

Fertile field for research and standardization, perhaps.  I understand
that the ANSI committee Z39.67 on "computer software description" has
a draft available, $25 to NISO, PO Box 1056, Bethesda MD, but that this
document concerns itself mostly with descriptions and cataloging of
shrink-wrapped commercial software and not the less orderly stuff
that flows through usenet.  (Haven't read it myself.)

--Ed

Edward Vielmetti, U of Michigan math dept <emv at math.lsa.umich.edu>
moderator, comp.archives



More information about the Alt.sources.d mailing list