AFS - more info

Mike Kazar kazar+ at andrew.cmu.edu
Sat Jun 10 00:51:39 AEST 1989


[Disclaimer: I'm the AFS development manager at Transarc Corp, and thus
have a serious interest in AFS.]

Things seem confusing because Melinda Shore and Bill Sommerfeld are
comparing two *very* different versions of AFS.  Mt. Xinu's version is a
very old system sent unofficially to Mt. Xinu by the CMU CS department,
while MIT's Project Athena is running the latest and greatest version of
AFS direct from us.  Here are some of their differences:

1.  The old cache manager (the client-end code) is implemented as a
user-level process with limited internal concurrency.  It has 2 or 3
light-weight threads to handle user-generated file system requests.  If
these threads are busy, the user waits.  If the Unix process implmenting
the threads is busy, the user waits.

In the new implementation, AFS is just another (Sun) virtual file
system, sitting in the kernel.  There are essentially no external
concurrency constraints imposed.  Trivial operations (such as stat of a
cached file) run more than 10 times faster since a procedure call to the
VFS "afs_getattr" function is a lot faster than sending an IPC message
to a user-level process and waiting for a reply.

2.  The old system uses an inferior RPC package, with a separate file
transfer protocol invoked to transfer data blocks larger than a packet.

The new RPC integrates these two types of operations, cutting down
considerably (a factor of 2.5 or so) on the actual number of packets
sent in the most common cases.  In addition, we've learned a lot more
about file transfers, and have stolen some ideas from people who've done
TCP/IP improvements.  Thus our new RPC runs faster than the old file
transfer protocol, on top of the reduction in overhead from having just
one protocol.

3.  The old system is an administrative nightmare.  All sorts of ad hoc
databases had to be maintained via text editors, and then run through
obscure programs to convert them to internal forms.  These binary
databases would then propagate out to the other file servers over a
period of tens of minutes, during which time things often looked a
little inconsistent.

In the new system, these databases are implemented by replicated
transactional database servers.  System administrators update these
databases by issuing commands from their own workstations that make
authenticated RPCs to the database servers.

4.  The old system transferred entire files, and made the reader wait
until the entire file had been received before letting the user see
*any* of the data.  In the new system, files are transferred in 64 Kbyte
chunks, and the user process can read the data as soon as the
appropriate portion of the chunk has been received at the workstation.

So, here's my interpretation of the differences between M & B's posts:

>   1) The [protection] semantics really are different from Unix
>     filesystem semantics.
>
>The use of access control lists is necessary in large-scale
>environments.

Not a rebuttal, true, but certainly a justification for doing something
incompatible with Unix protection given a large environment.  Since
virtually no Unix programs know anything about ACLs, we had to do some
pretty odd things, as compared with straight Unix protection, to get a
reasonably powerful and simple that even novices can use without being
surprised.

>   2)  Directories, which you and I consider to be files, aren't
treated as
>	   files by AFS.  *No* caching, which means that you can ls until the
>	   cows come home but the 80th time is not going to be any faster than
>	   the first.
>
>Please check your facts; last I looked, they're cached just like files.
>A significant part of the hair in the AFS client is involved with
>keeping the local copy of a directory in synch with the master copy
>when directory operations are done.

All versions of AFS have always cached directories.  AFS does name to
low-level ID translation on the workstation; there's no way we could
fail to cache directories in any of our releases.  Even a bug of this
magnitude would be instantly visible.

>   3)  Performance.  The whole file is copied over at access time, which
>	   speeds up future file accesses but can turn "grep string *" into a
>	   fairly unpleasant experience.
>
>Yes, but the user process doing the "grep" sees the bits as soon as
>they're available, and doesn't have to wait for them to be written to
>the cache.

It should be clear from the above that the system Shore is using would
perform much worse on a 'grep string *' than the current system.

>   4)  Disk usage.  Because entire files are copied over it can be
something
>	   of a disk burner.
>
>True; you want a large enough cache that the "working set" of files
> you normally touch in the period of an hour or two fits in its
> entirety; for normal users, 10MB is probably enough, while for "power
> users" doing kernel builds, 30MB+ is more like it.  [Barry quoted Bill
out of context,
> eliminating the last three lines of Bill's reply; I've restored the
rest -MLK]

10 MB isn't in my opinion a disk burner, when you realize that you don't
have to keep virtually anything on your workstation disk aside from the
cache and swap space.  For example, most developers of large systems
have hundreds of megabytes of local disk storage so that they can work
effectively.  I've worked for years with only a 40 megabyte disk quite
comfortably, and only switched to a 70 when I upgraded to a new machine
that didn't come with such small disks!  If you're only a casual user of
AFS (rather than someone getting 99% of all your files from AFS), you
can get by with a couple of megabytes of cache.

>   5)  Administration is somewhat (!) complex.  
>
>Agreed, but managing 10 AFS servers is only slightly harder than

They're also talking about totally different administrative procedures
here, folks.  Shore's system is *very* painful to administer. 
Sommerfeld's is an order of magnitude easier to deal with.

So, in short, these folks really are comparing completely different
systems, and that's the main reason for the confusion.



More information about the Comp.unix.wizards mailing list