GNU-tar vs dump(1)

Steve Summit scs at adam.pika.mit.edu
Mon Jan 9 14:14:23 AEST 1989


In article <10797 at rpp386.Dallas.TX.US> jfh at rpp386.Dallas.TX.US (John F. Haugh II) writes:
(with respect to compressing "empty" blocks of zeroes)
>This problem and others can be solved by telling GNU-tar about the file
>system.  There is no reason a system utility shouldn't be aware of the
>system layout.
>How many CPU years are going to be wasted LZW'ing all those sparce blocks
>when a little file system knowelege would have saved us all that grief?

How many person years have been and will be wasted attempting to
port programs which ought to be portable but which contain
gratuitous system dependencies?  Tar can be written portably;
every attempt should be made to do so.  It has already been
asserted (and I'm inclined to believe it) that the time spent
looking for zeroes to compress is inconsequential, particularly
in an I/O intensive program such as tar.

A good example of the same problem can be found in diff: a nice,
simple text file utility which ought to be maximally portable,
and is an especially attractive porting target because nothing
like it exists on lesser systems such as VMS and MS-DOS.  Yet
part of its algorithm for distinguishing between text and binary
files involves reading a struct exec from the beginning of the
file and checking for magic numbers, which requires #including
the (very Unix-specific) <a.out.h>.  Doing so is in fact
pointless because the algorithm then goes on (in the absence of a
valid magic number) to look through the beginning of the file for
nonprinting characters, which a.out files are virtually certain
to contain.

Machine- or system-dependent code should be written only as a
last resort, when the need is clear and dire, when no portable
way of writing it can be found, and then only in utilities which
"have a right" to contain such dependencies (adb, fsck, etc.).
Tar is a file interchange program; you'll likely want to get it
working on another system some day so you can transfer things.

(Of course, non-essential system-dependent code, such as a Unix
filesystem empty block check, or diff's magic number detection,
could be surrounded with appropriate #ifdefs.  Unfortunately, it
rarely is, which leaves the eventual porter, if he isn't
experienced and isn't the author, quite uncertain as to how to
proceed, and liable to drop the project.  In the case of the
proposed filesystem knowledge for tar, an #ifdef unix wouldn't
even help, because Unix filesystem formats have been known to
change, and they can't even be assumed to be consistent on one
system any more, given the existence of file system switches and
remote file systems.  Why commit tar to all of these problems?)

>Write the code once and be done with it.

Indeed.

                                            Steve Summit
                                            scs at adam.pika.mit.edu



More information about the Comp.unix.wizards mailing list