VMS vs. UNIX file system

Mon Sep 19 03:06:54 AEST 1988

In the first place, it's not obviously an either/or situation. I
suspect that VMS's RMS could be implemented on top of Unix with little
or no change to the O/S (although performance tuning would have to
trade off asynch read-ahead/write-behind and Unix's buffer cache which
accomplishes much the same basic thing [ie. the block you want next is
highly likely to be off the disk and in memory by the time you need
it], albeit in a different manner with different considerations.)

I wouldn't be at all shocked to see DEC announce (essentially) RMS
under Ultrix (and I'll bet a dollar someone is working on this.) Fine
idea, as long as it's not in the OS.

One problem with structured files that's easy to see is whether
information stored in the file to represent the structure is part of
the file or not.

For example, if in a variable length, blocked format you store the
length of each record as a preceding field of 16-bits, is the size of
the file the size of all its data + NRECORDS*2 (2 bytes)? Or just the
size of the file (that is, what does a file status query return?)

That doesn't seem terribly important at first (who cares, choose
a solution and stick to it) until one wants to access the thing
as a raw file (something always trivial to do in Unix's scheme.)

Now, is the 16-bit field counted in a file position seek? Can I safely
take two positions, POS1 and POS2 (byte offsets into the file, a la
ftell or lseek) and subtract them, perhaps then allocating and copying
the data? Or might the result be larger (OS adds in the 16-bit fields)
or even incorrect (POS2 should have been incremented by NRECORDS*2,
but I can't really calculate that number NRECORDS very easily, in
advance.)

I'm not sure I'm claiming that Unix solves any of this other than
laying things out so very barebones and w/o OS interpretation that
it's totally up to the user, no hand-to-hand combat with a record
management system required.

Anyhow, I may not be expressing myself very well, but I have used VMS
and IBM record access methods enough over the years to know that
sometimes they can drive you to tears (usually because the OS feels it
has a better idea of what you are doing than the programmer does,
and modifies or otherwise "corrects" your requests.)

What's far more important, in my experience, is to have an orderly set
of access methods and to use them only where they are truly justified
(ie. simply because it's faster is not a good enough excuse if 99% of
the actual applications will perform faster than human response time
with either method, naive or sophisticated.)

I remember, for example, when the VMS HELP files went from a very
simple, textual format to their current library format and it made
working with them in new and creative ways nearly impossible (I had
written a full-screen access to the VMS help files in TECO, no
kidding, which was nearly impossible to salvage, I never bothered.)
I'm not sure the changeover was really much of an improvement, sped up
something which was fast enough already and added a lot of complexity
where it was unappreciated, adding a new help topic became more
complicated etc.

Not a flame, just trying to emphasize my point about it's good to
have access methods, but it tends to lead people astray into using
them just to avoid scanning a file when the latter would perform
fine and would greatly simplify later maintenance (typically, the
file can be manipulated with a text editor) etc.

	-Barry Shein, ||Encore||