System crashing.. HELP!
Ross Parker
parker at zaphod.Berkeley.EDU
Tue Mar 6 22:40:37 AEST 1990
System: Microvax-II with Emulex QD-32 disk controller and
two Fujitsu Eagle disk drives. 13 Mb memory. Running
Ultrix 3.0. System supports perhaps 15-20 interactive
logins, and perhaps 20 PCs connected via Sun's PC-NFS.
The PCs access files using standard NFS on the Microvax.
Symptom: One user on a PC can try to bring up a particular
file under WordPerfect (version 5.0 or 5.1) on the
PC, and, without fail, cause the Microvax to
instantly crash. This problem just started happening.
No system changes, either hardware or software, have
taken place for a number of months. The user does
not have a problem with any other files, nor does
any other user cause the system to die. The system
is also used for NFS operations from other Vaxen, and
from some Sun systems, and no problems occur. The crash
symptoms are (on the console):
Trap Type 9, code = 803771ff, pc = 80034ca0
panic: Protection fault
and in the error log:
********************************* ENTRY 29.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 104. CONTROLLER ERROR
SEQUENCE NUMBER 0.
OPERATING SYSTEM ULTRIX 32
OCCURRED/LOGGED ON Mon Mar 5 13:44:28 1990 PST
OCCURRED ON SYSTEM waters
SYSTEM ID x08000000
SYSTYPE REG. x01010000
FIRMWARE REV = 1.
PROCESSOR TYPE KA630
----- UNIT INFORMATION -----
UNIT CLASS ADAPTER/CONTROLLER
UNIT TYPE UDA50A
CONTROLLER NO.
UNIT NO. 0.
ERROR SYNDROME CONTROLLER ERROR
********************************* ENTRY 30.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 200. PANIC
SEQUENCE NUMBER 5.
OPERATING SYSTEM ULTRIX 32
OCCURRED/LOGGED ON Mon Mar 5 13:42:26 1990 PST
OCCURRED ON SYSTEM waters
SYSTEM ID x08000000
SYSTYPE REG. x01010000
FIRMWARE REV = 1.
PROCESSOR TYPE KA630
PANIC MESSAGE Protection fault
********************************* ENTRY 31.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 109. EXCEPTION/FAULT
SEQUENCE NUMBER 4.
OPERATING SYSTEM ULTRIX 32
OCCURRED/LOGGED ON Mon Mar 5 13:42:26 1990 PST
OCCURRED ON SYSTEM waters
SYSTEM ID x08000000
SYSTYPE REG. x01010000
FIRMWARE REV = 1.
PROCESSOR TYPE KA630
----- UNIT INFORMATION -----
ERROR SYNDROME PROTECTION FAULT
********************************* ENTRY 32.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 250. ASCII MSG
SEQUENCE NUMBER 7.
OPERATING SYSTEM ULTRIX 32
OCCURRED/LOGGED ON Mon Mar 5 13:42:40 1990 PST
OCCURRED ON SYSTEM waters
SYSTEM ID x08000000
SYSTYPE REG. x01010000
FIRMWARE REV = 1.
PROCESSOR TYPE KA630
MESSAGE done
********************************* ENTRY 33.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 250. ASCII MSG
SEQUENCE NUMBER 6.
OPERATING SYSTEM ULTRIX 32
OCCURRED/LOGGED ON Mon Mar 5 13:42:39 1990 PST
OCCURRED ON SYSTEM waters
SYSTEM ID x08000000
SYSTYPE REG. x01010000
FIRMWARE REV = 1.
PROCESSOR TYPE KA630
MESSAGE syncing disks...
Now this certainly looks like a probable bad controller,
right? Well, we've replaced the controller with a new one, and
get an identical problem... down to identical register values in
the register dump. We've also run DEC diagnostics and the system
passes with no problems, other than (and this I'm mildly worried
about) the disk controller... however, I believe the controller
is failing because it's a non-DEC controller, and DEC's diags are
expecting a KDA50. The controller (the new one) passed the
vendor's diags, and the diags included scanning the disks. No
problems were found anywhere.
In addition, we can read and write any file on both disks
locally (not via NFS) and no problems occur, so the problem is
possibly related to NFS rather than to disk driver code or
whatever.
Perusing the resultant crash dumps has not given me
much enlightenment, however, I'm not an expert at that, and
have misplaced my list of magic incantations to have adb show
anything useful. Perhaps someone can enlighten me? Care to
bite, George? I'm sure you've done this numerous times.
If anyone can shed any light on this, it'd be *much*
appreciated. This Monday, the system went down about 7 times
before this particular user called us to say that each crash
happened exactly when she tried to access this file!
Thanks,
Ross Parker parker at mpre.mpr.ca
(604)293-5495 uunet!ubc-cs!mpre!parker
More information about the Comp.unix.ultrix
mailing list