RL02 headache

KENNER at NYU-CMCL1.ARPA KENNER at NYU-CMCL1.ARPA
Wed Dec 19 07:11:38 AEST 1984


Data-late errors are fairly common on RL01/2's.  But they should be retried
by the driver so they are not a real problem (there was a time early in
the history of the RL01's when a data late in the wrong part of a write
would clobber the next sector but we found an ECO for that and DEC has had
in the controllers for at least 4 years by now).  I do not know if the
UNIX driver will retry them but the RSX and VMS drivers do.

However, you do NOT seem to be getting data late errors!  Look at the
descriptions of those bits more carefully!

Here's the description from the RL01 User's Manual (it is the same for
the RL02 but I don't have that manual handy):

bit 10	Operation Incomplete (OPI)	When set, this bit indicates that
					the current command was not
					completed within 200ms.

bit 12	Data Late (DLT) or Header
	Not Found (HNF error)		When OPI (bit 10) is cleared and
					bit 12 is set, it indicates, that,
					on a write operation, the silo was
					empty and, therefore, a word was
					not available for writing; or,
					on a read operation, that the 
					silo was full and unable to store
					another word from the drive.

					When OPI (bit 10) is set and bit
					12 is also set, it indicates that
					a 200ms timeout occurred while the
					controller was searching for the
					correct sector to read or write
					(no header compare).

So what you are really getting is a Header Not Found.  This is most likely
a bad sector on the device.  As to why VMS BAD and EVRAA don't find it,
the only possibility that I can think of is that it is in a reserved
area of the pack (such as the last track) which UNIX, for some reason,
is trying to access.

Your FE is right -- there does seem to be a problem with a bad header.

As to solutions, here's what I can think of:

(1) Try another pack.
(2) It is conceivable that the system is trying to access an invalid sector.
    Look at the sector number to make sure that it is valid.  Another
    possible problem area is that seeks to the RL01/2 are given as deltas
    from the current position and the system might be confused as to the
    current position.  The driver, when it gets a HNF error should attempt
    recovery by forcing the drive to cylinder zero and retrying.  If it
    doesn't to this, it should.  
(3) Alternatively, try forcing EVRAA to just run on a small range centered
    around the problem area.  I THINK (but am not sure) that this can be
    done with EVRAA.
 
-------



More information about the Comp.unix.wizards mailing list