Myths about tape block sizes

Fri Dec 14 15:39:29 AEST 1990

In v9n397, wsrcc!wolfgang at uunet.uu.net (Wolfgang S. Rupprecht) writes:
>SCSI itself has a similar limit.  Thats why one can't get more than 126
>blocks of 512 bytes in one tape read or write.

Ideas like this have tended to propagate into the lore about parameters
you should specify to user-level tape commands.  For example,

	setenv TAPE /dev/nrst8
	tar cvbfle 126 $TAPE tree

has often been considered the way to "efficiently" create a QIC-24 tape
archive.  However, regardless of whether such a limitation exists at the
hardware level, current SunOS releases (I use 4.0.1 on a 3/50) do a good
job of hiding this from the user.  For a long time I, too, didn't
understand this, and I often waited hours as my 1/4" cartridge drive sawed
back and forth.  Recently, though, I have run experiments which prove that
much larger user block sizes work just fine, and FAR FASTER.  For example,

	dd if=diskfile of=$TAPE bs=1000b

can be used to transfer the given diskfile (if its size is a multiple of
512 bytes).  This block size is a big improvement over "bs=126b".  Reading
the tape back afterwards with

	dd if=$TAPE bs=1000b | cmp - diskfile

proves that the data was written correctly.  (I don't know how much of a
performance difference it makes, but note that I often access files from a
remote-mounted filesystem [Wren, 3/60] in such transfers.)

Now my only questions are, now that we know the hardware value is not the
limit, what is the actual limit, and what is the optimal block size to
specify on tar, dd, and similar commands?  Certainly the optimal size must
be a tradeoff between the speed of the *disk* (and/or network connection)
you're reading from / writing to, and the time penalty for stopping and
starting the tape drive.  You want to advantageously overlap disk and tape
i/o, just as network analysts have found that optimal network throughput
is achieved not by huge blocks, but by balancing the time spent in
generating the data with the time spent in communicating it.  The best
performance comes when both the CPU and the network are simultaneously
active, not when one has to wait for the other to finish handling a large
block.  In the case of a QIC tape, however, the cost of starting and
stopping the streaming action to a large extent seems to outweigh the cost
of non-overlapping computation and communication.

So now that the truth is revealed, has anyone done more extensive testing,
and could they provide some guidance to all of us so we can collectively
save years of wasted time?