Long filenames (was: What kinds of things would you want in the GNU OS?)

Tue Jun 6 14:07:16 AEST 1989

In article <9422 at alice.UUCP>, andrew at alice (Andrew Hume) writes:

>	my point is that if you have structured names like
><machine><report>-<option>.<time_period> (barnett's example),
>the "lazy part" is putting that in the filename and using the
>shell's pattern matching to select files. The alternative
>(there are obviously bunches) is to put this database in a file
>that looks like
><datafile>\t<machine>\t<report>\t<option>\t<time_period>
>and use awk (or cut) to select on arbitrary fields. e.g.
>	more `awk '$2=="mymachine"{print $1}'`
>
>this is only slighlty more work, much more flexible AND
>doesn't require the kernel to support gargantuan filenames.

Wrong. It would require much more work and be much less flexible.

Example:
	If I wanted to print out all weekly sa -in and sa -im reports
for machines vaxA and SunB that ocurred in January, I could type:
My Method:
	print {vaxA,sunB}*sa-i[nm]*Jan*WEEK
Your method:
	print `awk '$2 ~ /vaxA|sunB/ && $3 == "sa" && $4 ~/i[nm]/ && \
		$5 ~ /Jan.*WEEK/  {print $1}' data `

Disadvantages with your method:
	1. Simple queries now require either an AWK programmer or a
	   a sophisticated script.
	2. The file "data" must be keep up to date. If 50 files were created
	   a day, and files could be deleted whenever the disk filled up,
	   keeping this file up to date requires an extra step.
	3. The biggest disadvantage is that if dozens of scripts were written,
	   and it became necessary to change the database, all of the scripts
	   would have to be re-written.

If I had to re-implement my report scheme on a system with filenames
less than 14 characters, it would have taken me twice as long to do it.

There are so many advantages to long filenames:

I have never had a problem with a shell script that did
	mv $1 $1.orig

I have enabled GNU emacs's numbered backups mechanism, so that
old files are renamed file.~1~ file.~2~ ... file.~12~ etc.
(By default GNUemacs keeps the two oldest and two newest versions).
If I used this scheme, then all of my non-SCCS awk scripts would be limited to
five characters: (e.g. abcde.awk.~12~)

I can also use filenames to indicate the function of the script.
(e.g. "archive-to-tape-old-newsgroups" vs. "ar2tape-oldng")

But the biggest win is the ability to use the filename for the data.

Another example is the large USENET archive I keep.
First of all, I store old articles using the format
	./news.group/yy-mm/article-id
(The top directory is the newsgroup. The next directory tells me the year
and month of the posting. The filename is the article-ID of the article).

There is a one-line summary of the subject line in the file
	./LOGS/news.group
which contains the filename and subject line. When I archive the big-old
newsgroups to tape, the log file is renamed (appending the current date
to the filename).

There are so many advantages to this scheme. Articles are always a known
depth from the top (comp.binaries.ibm.pc.d vs. comp/binaries/ibm/pc/d).
There is a simple way to determine if an article is archived twice.
The filename contains all of the information needed. I don't have to
search another file to determine the name for the archive.

I now have the following pieces of data available:
	The newsgroup
	The year and month of the posting
	The article-ID
	The machine the article was posted from
	The filename of the article on the disk or tape
If the article has been archived to disk, I also have:
	The name of the tape the archive is on
	When I created the above tape

Now ALL of the above information is stored in filenames. The log file
is the filename and the subject line. I don't need files to keep information
about files, especially when I have to keep track of 400,000 files.

My database queries are done with grep, cat and find. Once I find the
file I am interested in, I use a very simple awk command to do something
with the files.

In short, the fact that I have hundreds of thousands of files with the
length of 30 characters (which is NOT gargantuan in my mind) allows me
a simple, elegant method of organizing data.

The same task on a machine with the archaic limit of 14 characters
would have make the task more difficult, more complex, more inflexible
and more inefficient.

--
Bruce G. Barnett	<barnett at crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!crdgw1.ge.com!barnett barnett at crdgw1.UUCP