tbuf errors, 750 Rev 7

Bruce Nemnich bruce at godot.UUCP
Sat Dec 15 09:54:15 AEST 1984


Nor did I mean to jump on irwin; I realize he was quoting DEC.  When I
first started fighting this problem, several DEC folks as much as said
it was a unix bug: they will usually say something like, "It is a
problem with 4.2bsd on a 750" rather than, "It is a problem with the 750."

During the 5 months this machine ran VMS prior to conversion to Unix, it
crashed once every three or four weeks because of the same problem.
Other times it would simply kill the process which was running when the
machine check occured.  After switching to Unix, the frequency increased
by a factor of about 3.  When I fixed the incorrect mask for the tbuf
error in machdep.c, it cut the crashes in half.  After trying many board
swaps, I finally got one whose error incidence was much lower.  But on
to better things....

I had Rev 7 installed on Tuesday.  Barry Lustig was kind enough to send
me code from Jim McKie (mcvax!jim) to load the patchable control store
file off disk as part of the boot sequence.  I have had no problems
after 3+ days.

However, one of the guys doing the installation said that the Rev 7
upgrade actually makes the problem worse by altering the way the machine
checks (?!) on tbuf and/or cache errors.  I don't know what he meant by
this (and I am not sure he did), and I have not tried to verify it.

The Rev 7 FCO description says it is fixed: "TB parity error machine
checks, due to TB RAM soft errors, are fixed by CMT098 micro-code....
CMT098 will perform single-retry recovery attempt, per
macro-instruction, without error-report; but will generate standard
Machine Check on subsequent or hard errors."
-- 
--Bruce Nemnich, Thinking Machines Corporation, Cambridge, MA
  ihnp4!godot!bruce, bjn at mit-mc.arpa ... soon to be bruce at godot.arpa



More information about the Comp.unix.wizards mailing list