cpio(1) under Sun 3.0; or, does System V write filenames backwards?

guy at sun.UUCP guy at sun.UUCP
Sat Sep 6 14:53:06 AEST 1986


> The cure, as prescribed, is to dd(1) the contents first with the
> `conv=swab' option to swap all the bytes, including the header, before
> feeding to cpio (with the `-s' option set).  As I was only interested in
> a table of contents, I merely tried to get one via the `-t' and `-v' options
> to produce an `ls -l'-like output.  In doing so, I discovered that swapping
> all the bytes made cpio happy, yet somehow the filenames were still coming 
> out byteswapped!!
> ...
> I assumed that byteswapping everything would take care of the filenames
> as well, but apparently they are in the `correct' order (for Suns &
> 680x0 architecture, at any rate) before the byteswap.
> 
> How this might have arisen???  Is it a bug in the way
> it (the tape) was written originally, or a bug in cpio(1)?
> Or in the way a VAX writes char arrays?

The tape was written correctly.  VAXes write "char" arrays the way any sane
machine does: if character N of an array goes onto frame M of the tape,
character N+1 goes onto frame M+1, etc..  Any sane machine will also read
"char" arrays in the same way, so the character array

	"Kilroy was here"

will, if written to a tape by a sane little-endian machine, produce

	"Kilroy was here"

when read from that tape by any sane machine, whether big-endian or little-
endian.  Thus, the filenames are in the right order, except on insane
machines that swap bytes on character strings when they write them to tape
(there are such machines out there, alas).

(BTW, "for Suns & 680x0 architecture" is redundant in this case; the 680x0
is big-endian regardless of whether it appear in a Sun or anything else.
The only machine I know of where its "endianness" is settable is the WE32000
chip, and maybe the later chips in that family; the endianness of that chip
is settable from one of the pins on the chip, but I don't think there are
many of them running as little-endian machines, if any at all.)

The problem is that a "cpio" tape consists of three kinds of data:

	1) Headers.  All the data in a header (unless the tape was written
	   with the "-c" option) are in the form of "short"s, and must be
	   byte-swapped if they are read on a machine with a different
	   byte order with "short"s.

	2) Pathnames.  This is a "char" array, and must not have its
	   byte order changed.

	3) File contents.  In general, this is either: text, which is,
	   in effect, a gigantic "char" array and must not have its
	   byte order changed, or binary data, which could require
	   an arbitrarily complex transformation, so simply changing the
	   byte order is unlikely to be useful.

"dd" will change the byte order of *all* the data on the tape; thus, the
headers will be read OK but everything else will be garbled.  The System III
"cpio"s "-b" option would swap the pathnames and the file contents, leaving
the headers alone; thus, you first run the tape through "dd" and then
through "cpio -b" to read it correctly.  Obviously, the person who
implemented this realized that swapping most of the data on the tape twice
was far more efficient than swapping a small amount of it once.

Some bear of equally little brain decided to "fix" this for System V; they
realized that almost all files written to "cpio" tapes consist solely of
characters, "short"s, or "long"s, so there should be options to swap bytes,
halfwords, or both, and those options should apply *only* to the data.
Thus, there is no now way to swap *just* the headers by some combination of
"cpio" and "dd".

The correct fix - available in our next release, because it bit *me* when
trying to read in a "cpio" tape made on a VAX - is to check the "magic
number" in the header.  If it is equal to a byte-swapped version of the
"cpio" magic number, then the tape is almost certainly a "cpio" tape written
on a machine of the opposite byte sex; "cpio" should then byte-swap the
header *and nothing else*.  This way, you don't have to worry about the byte
sex of the machine on which the tape was written (unless you're trying to
transport binary data, but in that case it's not a simple matter of byte sex
anyway); "cpio" will figure it out for you.

Ideally, the "-c" option should be used; that writes the header in a
printable ASCII format, just as "tar" did N years before the "cpio"
maintainers figured it out.  Unfortunately, there is a bug in the System III
"cpio" that means that the "-c" format doesn't work right.  Equally
unfortunately, the S5R2 distribution tape wasn't written with "-c" ("gee,
why should it be, if it's a VAX distribution tape people are going to read
it in on their VAX, right?").
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy at sun.com (or guy at sun.arpa)



More information about the Comp.bugs.4bsd.ucb-fixes mailing list