"tar" and non-8-bit byte machines

John Bruner jdb at mordor.UUCP
Tue Nov 20 05:36:55 AEST 1984


The S-1 Project at the Lawrence Livermore National Laboratory is
porting UNIX to our own machine, the S-1 Mark IIA.  One problem
that we're currently trying to solve is the implementation of "tar".

The crucial facts are:

1) The S-1 memory is organized into 36-bit words (addressable in
   9-bit quarterwords). **Sigh.**

2) On the S-1, characters are nine bits and are stored one per
   quarterword.

3) UNIX does not distinguish file types (e.g. character vs. binary).

The problem is this: we want to be able to read/write "tar" tapes
containing ASCII text files on both the VAX and the S-1. The
"obvious" mapping is for the S-1 to associate each 8-bit byte
with the low-order 8 bits of a 9-bit quarterword, discarding or
zero-filling the uppermost bit in the quarterword as appropriate.

A different mapping is required for binary files (because the
ninth bit is significant): the S-1 packs 9-bit quarterwords into 8-bit
bytes.  (There is hardware support for this conversion operation.)

The issue is that, in order for the VAX to read S-1 text files
and vice versa, text files must be stored using a different
representation than binary files.  There is no reliable way to
determine whether a file should be "text" or "binary" when the
tape is written, and no field in the "tar" header for recording
this information even if the writer could reliably figure it out.

If all files on the "tar" tape are stored with 9-bit quarterwords
packed into 8-bit bytes, text files on the "tar" tape are
unusable on the VAX.  (Of course, we have programs which will
pack/unpack them, but this must be done manually and it is a real
hassle.)

I don't want to define an incompatible "tar" format for the S-1.
I have used UNIX systems for M68000's which write tapes with byte
reversal problems so that I could not read them directly on our VAX
(it was necessary to pipe the input through "dd conv=swab"), and I
feel that the intent of "tar" format is to provide a standard
means for information exchange.  At this point, though, I can't
think of any alternatives to this approach.

P.S. Our next machine will have 32-bit words, but it will also have
hardware tags.  An image copy of a file on tape will include both
the 32-bit data and a 4-bit tag (probably stored in a fifth byte).
While the 9/8-bit packing problems will go away, the key problem still
remains: a "tar" text file should contain only characters (not tags),
so binary files and text files must be stored in a different format.
I don't see how to do this with the current "tar" definition.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb at mordor.ARPA [jdb at s1-c]	(415) 422-0758
  UUCP: ...!ucbvax!dual!mordor!jdb 	...!decvax!decwrl!mordor!jdb



More information about the Comp.unix.wizards mailing list