Standards Update, ANSI X3B11.1: WORM File Systems

Wed Sep 19 09:37:37 AEST 1990

Submitted-by: jsh at usenix.org (Jeffrey S. Haemer)

           An Update on UNIX*-Related Standards Activities

                            September 1990

                 USENIX Standards Watchdog Committee

          Jeffrey S. Haemer <jsh at usenix.org>, Report Editor

ANSI X3B11.1: WORM File Systems

Andrew Hume <andrew at research.att.com> reports on the July 17-19, 1990.
meeting in Murray Hill, NJ:

Introduction

X3B11.1 is working on a standard for file interchange on write-once
media (both sequential and non-sequential (random access)): a portable
file system for WORMs.  The fifth meeting was held at Murray Hill, NJ
on July 17-19, 1990.  We adopted a working paper and set to work on a
list of issues suggested by the chair.

Data Compression

Despite the huge capacities of WORM disks, people always want more.
Data compression is an easy way to supply more, and on current machine
architectures, probably can speed data access by trading CPU cycles
for I/O bandwidth.  Its main problem is that you need to support more
than one algorithm and thus, you need some way to specify algorithms.
This is a purely administrative issue, but luckily, it appears that X3
may soon act as a registry for compression algorithms (driven by the
need to register compression algorithms for IBM 3840 cartridge tape
work in X3B5).  (How does this fit in with the rumblings about
compress from POSIX.2?  I'm not certain.  I think part of becoming
part of the register means giving up patent rights or allowing liberal
licensing, but maybe not.  After all, the CD formats are now an ISO
standard, but I still think you have to be licensed to make them.)

Path Tables and Extended Attributes

Path tables were removed from the working paper.  We agreed to support
hard and symbolic links.  The next question was how to handle
``secret'' files: files primarily intended for system use.  Examples
might include the file describing free space, associated files (like
the resource fork of a Macintosh file), and extended attributes (of a
Microsoft HPFS file).  We agreed that the latter two cases should be
handled by regular files that probably are not in the directory tree

__________

  * UNIXTM is a Registered Trademark of UNIX System Laboratories in
    the United States and other countries.

September 1990 Standards Update        ANSI X3B11.1: WORM File Systems

				- 2 -

but are pointed to by the ``inode'' for a file.  (Note that this
implies there is a way to scan all the files in a volume set without
traversing the directory tree(s), analogous to running down the inodes
in UNIX.)

Given this, we have decided to support extended attributes as a
``secret'' or system file (and probably include pointers to things
like resource forks as those attributes).  This also gives us an
extensible way of handling non-standard or non-essential inode fields.
One of the important tasks remaining is to decide which fields are
more-or-less mandatory (such as modify time, owner) and which can
safely be pushed off into the extended attributes (access control
lists, file valid after date).  Please send us your suggestions!

Space Allocation and Management

We agreed that we have to support preallocating space for files,
freeing some or all of that space and then reusing that space for
other files.  After much discussion about extent lists and bit maps,
we compromised on a scheme based on extent lists (the details to be
worked by the working paper editor).  The idea is that is that the
free space is described by an extent list (of small but specifiable
size) of the ``best'' (probably largest) free spaces, and if this
overflows, ``worst'' free spaces are added to a system file
representing all the free spaces not in the above extent list.

Checksums

It was decided that all system data structures would include a 16 bit
checksum (CRC-16).  We anticipate that most errors would be transient
(cabling or memory) and not be media errors.

Multi-Volume Sets

I had thought the last meeting had settled just about all the
questions about multi-volume sets; I was wrong.  It took most of a day
to agree on these.

   - You have to have the last volume in order to grok the whole
     volume set (access any/all of the directories and files).

   - You can extend volume sets at any time.  This and the last item
     taken together imply the existence of ``terminal'' volumes (which
     can act as master volumes of a volume set) and ``nonterminal''
     volumes (the rest).  For example, if I extend a single-volume
     volume set by two volumes, then volumes 1 and 3 are terminal and
     volume 2 is not.

   - You can extract file data from any volume by itself.  This is
     meant only for disaster recovery (I dropped the master volume
     down the stairwell) and doesn't imply any requirements on

September 1990 Standards Update        ANSI X3B11.1: WORM File Systems

				- 3 -

     directory tree information (much as fsck restores unattached
     inodes to /lost+found).

   - Volumes can refer to data (say, extents) on other volumes (both
     earlier and later volumes).  Preallocated space on any volume in
     a volume set can be returned for future reuse.

   - The address space of logical blocks for the volume set will be 48
     bits; 16 bits for the volume number and 32 bits for the logical
     block number within a volume.  Media can be big (200GB helical
     scan media exist now) so 32 bits may seem barely big enough, but
     in such cases you can use a big logical block size.  For example,
     a logical block size of 16KB implies a limit of 64 terabytes per
     volume; this should be ample for a few years.

Defect Management

We spent a lot of time on this and learned a lot, but basically put it
off to the next meeting.  What we mean by ``defect management'' is
``How do we deal with write errors from the file system's point of
view?'' (We ignore the disk controller and the device driver, both of
which do some unknown amount of more-or-less transparent error
management.)

We discussed the ``sane'' approach: insert a layer between the file
system that handles errors, allowing the file-system code to assume an
error-free interface.  This apparently good idea is ruled out by
slip-sectoring, a (to my mind bogus) technique, which says, ``if
writing block n fails, then try subsequent blocks (n+1, n+2, ...)
until we succeed.'' Slip-sectoring is mainly used to enhance
performance (it does ensure that blocks are more-or-less contiguous),
and some disk controllers use it as their error-management technique.
(This really screws up your logical address space; it is legitimate
for a SCSI disk, your typical error-free, logical-address-space disk
interface, to write logical block 5 at physical block 5, then logical
block 1 at physical block 4 (1-3 were write errors), then disallow I/O
to logical blocks 2,3, and 4 because there is no place to put them -
these blocks just vanish!)

As preparation for the next meeting, Don Crouse, who deals mainly with
high-end machines like Crays and large IBMs, is writing a position
paper on performance, and members of the committee, many of whom are
drive manufacturers or integrators, are collecting estimates of error
rates we have to deal with.  (This matters; I see one bad block out of
100,000, but some people have used drives with a bad block in every
100.) The problem is that WORMs have really slow seek times, and when
you are pouring a 50MB/s Cray channel at a set of WORMs, you can't
afford to spend 1-2 seconds seeking to the bad block area.  I
personally think we should just do regular bad-block mapping (like
most SMD disk drivers) out of a special system file, and people with
performance concerns should arrange to have this space spread over the
disk.

September 1990 Standards Update        ANSI X3B11.1: WORM File Systems

				- 4 -

Endian-ness

A poll was taken of who really cared which way integer fields were
stored; the results were LSB - 1, MSB - 1, Don't Care - 11.  It is
awkward to specify one of LSB and MSB; this puts half the systems out
there at a competitive (performance) disadvantage (though I am
skeptical of whether it's significant).  Even though we're specifying
an interchange standard, the group felt that most interchange would be
between systems of the same endian-ness, so we should, somehow, allow
native byte order.  Accordingly, we agreed that endian-ness will be
specified in the volume header (for the whole volume set).  In
retrospect, I think this was silly; we should have just picked one
way.  In order that everyone important be evenly disadvantaged, we
could have used some byte order like 3-0-1-2 that no one uses.

Finale

The committee is trying to nail down a firm proposal for balloting.
We anticipate a substantial amount of change at the next meeting (Oct
16-18 in Nashua, NH) and have reserved time (Dec 11-13, but no place)
for an additional meeting so that we can ballot after the following
meeting (Jan 29-31, Bay area).  We now have a working paper (available
by the end of September or so); I think it likely we can meet this
schedule, but who knows.

Anyone interested in attending any of the above meetings should
contact either the chairman, Ed Beshore (edb at hpgrla.hp.com), or me
(andrew at research.att.com, research!andrew, (908)582-6262).  I am also
soliciting your comments on necessary inode fields and defect
management.  I will present anything you give me at the next meeting.

September 1990 Standards Update        ANSI X3B11.1: WORM File Systems

Volume-Number: Volume 21, Number 116