VMS vs. UNIX file system

Wed Sep 14 00:27:32 AEST 1988

>What does the added complexity of having to deal with RMS, FDL, CONVERT,
>etc., buy?
>-- 
>Dominick Samperi, NYC

There are plusses and minuses in both approaches. The intention of
formalizing a bunch of file access methods is to put the code
whereever the vendor (designer) believes it will do the most good. For
example, by knowing you have promised to access some file only
sequentially it can be stored in a manner optimal for that usage.
Similarly, an indexed file can have its read methods set up, perhaps
maintaining two separate cache's (indices,data), for optimal access.

It also means that you go through some standard set of routines with a
standard set of assumptions (eg. I can open an ISAM file, knowing a
few things about it, w/o asking for details about how it's stored, if
one builds their own ISAM file into a bag-of-bytes file it may not be
at all obvious how to read it without access to the original program
which wrote it.)

The downside is that these access methods tend to get used.

What I mean is, used unnecessarily where bag-of-bytes files would do
just fine and cause much less confusion.

For example, on an earlier release (probably 1.6) of VMS I wanted to
edit a file produced by RUNOFF (to do a few global changes so
underlining or some such would print properly on my printer.) Not as
easy as it sounded, EDT refused to load this print file for editing,
complained about an illegal file type.

One could point the finger at EDT and say it was deficient in not
handling enough file formats, I tend to think that barring super-human
effort it was inherent in the design environment, it would be hard to
properly edit every file type that was allowed (last I checked CONVERT
still couldn't convert some reasonable-looking conversions.) I believe
TECO did the job fine, but I was pretty shocked at not being able to
edit this fairly plain looking text file.

It wasn't the *data* which was preventing loading this into EDT (as
with, say, trying to load an a.out into VI which wouldn't work too
well either, but for a different reason), it was merely a bit
somewhere identifying this as a print file or some such nonesense and
thus EDT kicking it out without trying. Such problems were ubiquitous
(at least it always seemed like someone was coming to me trying to
work around a similar problem, utilities wouldn't cooperate.)

Under IBM systems with a similar record oriented philosophy I remember
real panic if we couldn't find the original parameters under which a
file was created. It basically couldn't be opened anymore unless you
could produce the right magic numbers it was created with (blocking
factors etc.) I'm sure some wizardly types could have solved that
directly but it sure wasn't obvious to us, other than guessing numbers
and paying real money to watch perhaps dozens of tries go down the
drain and feeling kind of foolish and seriously out of control.

The problem with the Unix "unstructured" approach is that either you
use some of the (very few) library routines (dbm is a major one, so
are the object deck readers in SYSV) or you roll your own, each
application will have its own way of storing data (compare termcap
with passwd with inittab with crontab with ...) often not terribly
well documented or efficient (agreed, often efficiency is a poor
excuse for obscurity.)

It's all a balancing act. In my ideal world there would be a variety
of standardized access methods and you would avoid using them like the
plague, especially in general system utilities, simple byte-stream
files should account for most input and output (a la Unix), but for
those occasional, carefully justified problems, access methods could
be resorted to. Also, the operating system would know as little about
them as possible (eg. opening any file as a byte-stream would do
something reasonable, *never* return an error.)

	-Barry Shein, ||Encore||