Ra81's and bad blocks
Andy Linton
andy at cheviot.UUCP
Wed Mar 20 01:14:28 AEST 1985
There has been a lot of traffic about UDA-50 devices on the net and I
was confused about bad blocks etc. on them. I asked our Dec Service
Engineer for more info and he produced the following:
**************************
ALL YOU WANT TO KNOW ABOUT BAD BLOCK REPLACEMENT AND MORE
Introduction
The purpose of this transmission is to inform the readership of the
differences between Dec standard 144 and 166 disk media.
Dec Standard 144 media: Rx01/2, R101/2, Rk05J/F, Rk06/7, Rp04/5/6/7,
R80, Rm02/3/5/80.
Above are some of the media that falls into the Dec standard 144
classification. A general rule to thumb is, any massbus disk media
conforms to this Standard. The rule may change, but in general the
rule holds true. The above also includes serial and parallel drive
subsystems, i.e. the R10/2, Rk06/7 and Rk05, Rx01/2.
Dec Standard 166 media: Ra60/1, Ra80/1/2.
Above are some of the media that falls into the Dec standard 166
classification. A general rule here is, if it plugs into a UDA-50 or
a UDA-50 emulator, its 166 media.
Differences with respect to bad blocking:
Bad blocking, by definition, is the generation of a file by a
software utility that contains information with respect to pattern
sensitive or unreadable areas of the media under test.
With the exception of Rx and Rk05 media, the manufacture of the
media tests and creates a manufacturer's bad block area. One of the
major differences between these media is that on 166 additions to the
manufactures area are not allowed as on 144. Another major difference
between these standards are the number of bad blocks allowed, i.e. 61
entries (Rp06) on 144 vs 17 thousand (Ra81) on 166.
The above manufacture areas differ greatly between these two media.
On the Ra series this table is known as the Factory Control Table, or
the FCT. This table could be loosely compared with the Rp series
manufacture's bad block, i.e. they both contain bad blocks found
during manufacture, but this assumption is misleading. During the
initialisation process on a RPxx pack, the manufacture's bad block
table is read and normally, dependent on the operating system, two
separate files are generated. During the initialisation process on a
Ra pack the FCT table is not readable and we therefore create two
files with null entries, assuming our initialisation process doesn't
know about 166 media.
If we compare what occurs on major operating systems during bad
block detection, I hope the reader can make sense of the above
statement. On, lets say an initialised RPxx on a running system, a
bad block develops. The drive subsystem reports a hard ECC error to
the operating system, the actions taken by the operating system on
receipt of this error normally takes the form of x number of retries
with offsets. If at the end of the day the error reported by the
subsystem still is an ECH, hard uncorrectable ECC, an addition to a
file, lets say, badblock.log is made. The resultant actions taken by
the system is; one, the data in that block is lost, and two, that
block is never used again during write operations. How this is
accomplished, again dependent on operating systems, is that a mount
time or detection, the badblock files are read and stored in memory.
In other words, it becomes a system overhead.
This differs with respect to the RAx series. On mount the same
action occurs, as on the RPx subsystem, but since there's no entries
(lets say), no harm is done here. As we write information onto this
structure the RAx micor processor notes from the target header that
the block is bad, it consults the Re-Vector Control Table, RCT, where
the data should be written, i.e. where has the block been re-vectored
and thus after the write, a re-vector is accomplished without system
intervention.
The RCT is a direct copy of the FTC, both these tables are not
directly accessible, at present, by any operating system other than
the applicable engineering diagnostic. If during a read operation the
subsystem reports an ECC error and the operating system supports Bad
Block Replacement, BBR, the system, dependent on the reported error,
i.e. 1-8 symbol ECC errors, can determine when it wants to re-vector
the block prior to the data degrading to unreadable. If it is
ascertained that the data is unreadable, worst case, ECH, a four phase
process is started. The first phase; the error, hard or recoverable,
is reported via the UDA-50 to the operating system. If hard or limit
is reached the system starts phase two; recover data, test block, and
report findings on suspect block. What happens during this phase is
the data is read and written into a scratch area of the RCT and a test
pattern similar to the read data is re-written. Error information is
then passed back to the system after a re-read, "yes bad block".
System says go, phase three please, find and test primary or secondary
replacement block, mark header of bad block as bad, add block to RCT,
and report errors or when finished. Go phase four, write data to
re-vectored block, if ECH occurred during read, write good ECC but
invert EDC bit to notify system that a forced replacement occurred,
i.e. data had ECH must be re-written or restored from whatever last
backup media used.
Summary
I hope the Readership can tell from the above, BBR if implemented,
will protect data to a level not previously thought possible. Drive
micro code determines at what error limit BBR kicks off.
If you'll note from the above only the RCT is updated with
additional bad blocks not the FCT. If a reformat is done on the
device the RCT data is zero'ed and FCT information replaced. This
implies any additional blocks to your users, assuming the formatter
doesn't find these pattern sensitive areas. I would only recommend
that a re-format be done after gross numbers of re-vectors, due to
read/write problems. If inverted edc's are a problem, have
engineering write to the customer area using /sec:manual, your data
will be lost but this action will re-invert edc's, it will not lose
the good information held in the RCT.
Regards
Ed Merrill
Country support Engineer
Internal Consultancy group
Basingstoke, England
44-256-56101 ext 3778
******************
I hope this is of some interest to those of you who have problems
with Ra81's (as I do).
Andy
Aindrias Mac Giolla Fhionntain - Computing Lab., U of Newcastle upon Tyne
ARPA : andy%cheviot%newcastle.mailnet at MIT-MULTICS.ARPA
UUCP : UK!ukc!cheviot!andy
*** Ni fui moran beagan d'aon rud, ach is fui moran beagan ceille. ***
More information about the Comp.unix
mailing list