more tales of RA81s--handling of bad sectors

David L. Gehrt dave at RIACS.ARPA
Thu Nov 15 08:07:38 AEST 1984


If you haven't discovered it by now, there is (probably) more
misinformation, and disinformation about the UDA50, and the attached
devices circulating than any other kind of computing  machinery or
peripherals I have encountered in 20 or so years of being "around".  I
will not claim special immunity from the effects of the the great
information void, but I will give what I believe to be the best
information available in response to your questions.  Perhaps someone
who has real, hard data, can correct any misstatements contained herein.

First, C. Torek is correct about the fact that a number of drivers
reporting hard errors, when in fact the drive or controller only found
a soft error.  For example our RA81's have frequent soft errors of the
type "[1-8] symbol ecc error" reported in a datagram.  The indication
is that the controller/drive found an error, and corrected it using its
error recovery logic.  Early versions would have reported these errors
as "hard".  Another thing that is difficult to discern from looking at
only the console output, is whether the error message is the result of
and "end message" or a "datagram".  Multiple datagrams can be generated
from a single transaction to disk, and in fact are not guaranteed to be
delivered to the host at all.  There is, on the other hand, only one
end message per transaction to disk, and it is guaranteed to be
delivered.  Another problem is that unless you have real clout with the
powers that be, getting documentation about what is really going on
based on the error messages is a real hassle (read impossible as I
understand the current situation).  DEC is trying to prevent
competitors from entering the UDA50/RA?? market, I guess.  The final
error message problem is that there are few drivers (one?) which will
report the existance of a "bad block" on the console, and there are
hard errors, which do not result in, or flow from bad blocks.

Your questions:

Is replacement the only solution to post-factory hard errors?

No.  It does seem to me that you need to eliminate electrical, and
mechanical problems which might indicate a repair or replacement is in
order.  I have heard of a number of "bad block" problems on RA?? drives
which went away when grounding straps were cinched down, or power
supplies were tweaked or replaced.  Also, internal electronics problems
in the drives, can give problems not much different in appearance from
media going bad (according to legend).  If the problem still appears to
be bad blocks, for real, there will be a driver around soon which will
handle the bad block reports, and arrange for revectoring.

Is there a formatter available for RA81's?

I don't think so.  As nearly as I can tell the drives are formatted
using commands in a protocol (*NOT* mscp) for which I have never seen
any documentation.  This means that the formatter available from DEC is
what there is, and it isn't too great.  It will format any amount of
the disk surface you would like, as long as it is the entire surface.
It is not clear that it will correctly handle bad blocks, except that
if it is true that the drive will not *write* a bad sector (a claim of
which I am very skeptical) , then perhaps formatting, and restoring
might be a way out.  I am skeptical. There is a mode in which the
standalone formatter for the UDA50/RA?? devices will start from scratch
and reinitialize an entire pack, supposedly rebuilding the RCT, and
otherwise handling bad blocks.  My CE says that once done, there is no
guarantee that the disk will *ever* be usable again.  Sounds like a real
slick piece of software to me.

Does it mark newly found bad sectors?

No, not on its own.  There is are flags in the end message which
indicate that a bad block was detected, and whether or not there were
more which couldn't be reported, and a field which indicates which
logical block has been found "bad".  The action taken, in most current
drivers, is to set an error flag in a struct buf, and hang it up.
There is a fairly complicated dance the host can engage in with the
hardware, and have the block revectored.  The driver in beta test does
this little dance.  If the host throws away the bad block report the
controller could care less.  The legend that the controller handles bad
blocks on its own is a myth.  I have never heard of a way to get the
controller to do the revectoring on its own.  There is nothing to keep
a unix system from doing the revectoring.  Contrary to the comments in
/etc/disktab, the RCTs required for the bad block forwarding operation
lie safely out of reach beyond the user accessible disk surface during
normal disk operations by the 4.2 driver.

Does the RA81 driver in ULTRIX handle bad sectors as claimed?

I have never heard any informed person, knowledgeable in ULTRIX, claim
it did.  Several months ago I saw a copy of a driver purporting to be
from ULTRIX (miles of copyright notices, and disclaimers and so on) it
had no code for bad block revectoring in it.  I have heard that the
ULTRIX folks are going to come up with a standalone program to do bad
block revectoring, but that is an unsubstantiated rumor, and the
persons whom I tried to contact, did not return my call.

I hope this helps, but if you have more questions, drop me a line.

dave

P.S.

Oh, a person who responded to your message, couldn't understand bad
blocks in the swap area, actually I suspect that there is probably more
i/o done in swap space than in other areas, and I would expect media
deterioration and bad blocks there first.  

As for the claim that a dump(8), newfs(8) followed by a retore(8)
cleared up the bad blocks, I am skeptical that what is reported
represents reality.  The restore, probably just picked different
blocks, or the errors reported were not in fact bad blocks. The
controller, at least our micro code version, makes a best effort
attempt to write where you tell it to, and to report errors detected.
Also, my experience has been that real "bad blocks" do not just go
away.  So, although such a strategy might be worth a try, I wouldn't
get my hopes up too high.
----------



More information about the Comp.unix.wizards mailing list