Kernel mode trap. Type 0x0000000E

Sat Mar 16 10:27:30 AEST 1991

In article <1991Mar05.164607.1179 at ism.isc.com>, support at bomber.ism.isc.com (Support Account) writes:
|> In article <1991Feb28.185352.22561 at DMI.USherb.CA> beauchem at terre (Denis Beauchemin) writes:
|> >We often see the following error message on different 386/33 MHz systems (some
|> >are SCSI and some aren't).  UNIX SVR3.2.2 is installed:
|> >6 lines of register information, then
|> >PANIC: Kernel mode trap. Type 0x0000000E
|> >Could someone tell me what is the cause of this PANIC?  We've been told that
|> >it's supposed to be related to memory, but is it hardware or software?
|> 
|> According to Intel's 386 chip documentation, interrupt 14 (="e")
|> is a page fault exception occurring when paging is enabled and an
|> error occurs translating a linear address to a physical address.
|> This error can be caused if the procedure doesn't have privileges
|> to access the page, or if the page-directory or page-table entry
|> used for address translation has a zero in its present bit.
|> 
|> A driver going awry can cause this condition. It can be analyzed
|> by building the kernel debugger into the kernel. When the fault
|> occurs and the system panics the os should drop into the debugger
|> which can then be used to display registers and to do a stack dump.
|> Under ISC Unix, the debugger is documented in section 8 of the
|> reference manual.
|> 
Well, It seems there are plenty ways to hit that wall. I'd like to share one:
We have an Armas 486/25 board here that employs a chip set by OPTI and has a
128k secondary level cache. We got 16MB in it and disabled relocation and caching
on the 256k left over from the first Meg. After running for about eight hours
the machine will start to get flaky. Programs dump core at will etc. And sooner
or later there will be a page fault within the kernel. Any reboot will reproduce
the error almost immediately after start up (a heat problem?--everything seems
rather cool when I take the machine apart, though). Disabling the second level
cache made the problem go away. I guess some of the secondary level cache's
static ram is faulty. Since there is no parity checking on that RAM, it is never
detected. If the kernel has to fetch some pointer from the 2nd-level cache and
that data is mangled, dereferencing that pointer might trigger the page fault.
Too bad, there is no easy way to find out which one of the 18 static RAMs has gone
bad...or is there?

:-> tom

-- 
----
Thomas M. Hoberg   | UUCP: tmh at bigfoot.first.gmd.de  or  tmh%gmdtub at tub.UUCP
c/o GMD Berlin     |       ...!unido!tub!gmdtub!tmh (Europe) or
D-1000 Berlin 12   |       ...!unido!tub!tmh
Hardenbergplatz 2  |       ...!pyramid!tub!tmh (World)
Germany            | BITNET: tmh%DB0TUI6.BITNET at DB0TUI11 or
+49-30-254 99 160  |         tmh at tub.BITNET