Misc uport bugs and observations

Mon Apr 4 12:22:15 AEST 1988

IN article <4387 at b-tech.UUCP>, zeeff at b-tech.UUCP (Jon Zeeff) wrote:
> Dcopy doesn't seem to work.  A dcopy from one 4096 drive to another seemed to
> work ok, but fsck found many errors (too many to fix).

Bet you were bit by the dual-drive-failure bug.  To my experience, that bug
is still with us on the 386: it just doesn't print the error message any more.
I had trouble with the WD1003 and WD1006: don't have a second drive to test
the WD1007 with.

> When using a WD1006-WAH controller, the system will hang if it encounters a
> drive error.

I had this problem too.  Pretty much prevents you from using any drive that
does not have the manufacturer's bad sector list.  I did not determine whether
the fault was with the WD1006 or uPort's hd driver (but guess which I suspect).
The problems went away once I corrected the bad sector table for 1:1 interleave
(see last paragraph below: INSTALL makes dumb assumptions).

> In the install process, the -V and -v options don't work.  You must enter
> all the bad sectors by hand and hope that the list supplied with the drive
> is complete.  Mkpart will find bad sectors, but it won't mark them as bad.

To my experience, the manufacturer-supplied bad track list is complete, as
their analog equipment will find anything that a simple write-read test might
hope to find.  As an aside, I seriously question the accuracy of testing any
drive via mkpart or any post-manufacture test: it might be all you can do,
but it may also give you a false sense of security.

A more serious related problem is that uPort does not appear to permit more
than 62 bad sectors per drive.  On a big disk where the manufacturer only gives
the bad track numbers (or if you run third party test programs that return only
track numbers), you can quickly hit this number at 17 sec/trk.  I understand
the desire to limit the size of the alternates table, but not at the cost of
being unable to use a drive (perhaps a binary, not linear, search of the
alternates table is indicated?).

> Uport unix doesn't seem to reset the disk and try again when it encounters a
> disk error.

Is this related to the WD1006 problem reported above?  I assume so.

> On a 4096 drive, a WD1006 controller does about 235k/sec with 1:1
> interleave.  A normal controller with 1:3 does about 125k/sec.  Both
> test were done with "/bin/time cp /dev/dsk/0s1 /dev/null" and using
> real time on a unloaded machine.

The same command gave me 38.3:real, 0.1:user and 23.8:sys with a WD1007/WA2
and a Compaq-damaged CDC Wren III.  Didn't bother to kill cron or anything,
so it was "unloaded" only in that no one was doing anything.  That comes out
to 327K/sec.  Don't know how much time it takes to switch heads, so don't
know what the theoretical maximum rate is, though it's probably less than
three times that value (for an ESDI drive that's really 34 sec/trk - WD1007
emulates 17 sec/trk).

> Here is the bfi<->sector chart I came up with for 1:3 interleave.  I have no
> idea if it is correct.
> 
> Sector		Bfi
> [ table deleted ]

The table shown did not match the table on page 12 of the "Installation Notes
for Runtime System" that came with my documentation.  I'm mail the correct
table to anyone who sends mail (to jva at astro.as.utexas.edu: killer's situation
is probably eating my mail).

Be aware that the INSTALL script on the Build disk assumes that if you don't
have a Televideo, you're using 3:1 interleave.  Dumb assumption with the
WD1006 or WD1007 (ie, Compaq 386/20 with the 150meg hard disk or PC's Ltd
with the 300meg drive).  You have to modify the build disk to use 1:1
interleave and have the bad sectors marked correctly.  Send to address in
above paragraph for details...
-- 
James R. Van Artsdalen       jva at astro.as.utexas.edu         "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746