Segmentation Faults in XENIX 2.3.3

Greg Wettstein NU013809 at NDSUVM1.BITNET
Thu Mar 14 23:58:20 AEST 1991


Our group is being plagued by segmentation faults (Signal 11) and I am
wondering whether anyone else has experienced similar problems.  I originally
attributed the problem to faulty hardware but I am now beginning to
entertain other causality.

The original problem was experienced on an ALR 386/220 with 6 Mbyte of
memory and I am convinced that this was due in large part to faulty memory.
Every ALR machine we have touched in this group has had memory problems
which appear to be related to DMA problems.  Even motherboard replacements
which were 'guaranteed to fix the problem' failed to stop them.  Besides
generating segmentation faults these faults would occassionally bring the
machine to an instant devastating halt.  At this point we were running
XENIX 2.3.2 with the UFJ update for VPIX yielding a kernel release of
2.3.3.

The problem got so bad that it became necessary to change out the hardware.
The machine we opted for was a Gateway 2000 running at 33 Mhz with 8 Mbyte
of memory and 64 kbyte of cache.  The disk system is a CompuAdd cacheing
ESDI disk controller (2 Mbyte) connected to a 350 Mbyte Wren.  Also in
the machine is a Mountain tape card, a Multitech 224EC internal modem and
a 12 port Equinox Megaport card.  The drive and controller came from the
the old machine and when we cranked up the system it made it through the
boot sequence but crashed with a memory error when I tried to login.

The first thing that went through my mind was that the memory board is
probably loose.  The Gateway 2000 is based on a Micronix motherboard which
has all its memory sitting on a card which plugs into a special 32 bit
slot on the motherboard.  I pulled the memory board, reseated it and
started up again.  This time I logged in but experienced numerous
segmentation faults and two crashes as I tested the machine for the rest
of the afternoon.

I seemed to stabilize the next day (Friday) but when I came in on Monday
morning it was sitting dead with a kernal panic on the screen.  We have
now been using the machine for a month and while performance is excellent
we are still experiencing enough in the way of segmentation faults and an
occassional panic that we cannot put the machine in production.

Yesterday I installed the xnx155b upgrade thinking that there may be a fix
embedded somewhere in there that would solve my problems.  I had noted that
a couple of the SLS upgrades available on the sco-archive directory on
uunet made mention of the fact that they corrected problems when XENIX was
run on various different types of motherboards.  The machine ran through the
night but this morning when I fired up emacs it persistently gave me
segmentation faults.  I ran shutdown from root and rebooted (without cycling
power) and things were fine.

I am presently in a quandary whether to call in the technical support people
and claim memory problems or look for OS problems.  I have written C programs
which malloc large blocks (>1 Mbyte) of memory and fill/refill these blocks
in various combinations, reallocing etc trying to flush memory faults but I
cannot seem to consistently force failures.  I have spawned several of these
until the machine was forced into severe swapping to the point where the
swap area was completely filled and generated no panics.  A little while
later I will be halfway though a large set of compilations and gcc will
dump aftering catching a segmentation fault signal.

I am getting rid to pull the Equinox card and its drivers out to see whether
or not they could be the root of the problem.  This card has performed
flawlessly so I am not very quick to point a finger at it.  I re-jumpered
the motherboard to disable cacheing of the memory region which the card maps
its buffer and control blocks into so that should not be a problem.  The
only reason I suspect an interrupt problem is that when uusched kicks up
uucico to poll one of the neigboring sites uucicio will fail and dump core
presumably due to a segmentation fault.  Occassionally when a neighboring
site calls in to poll us a similar event will happen.  I should mention the
fact that the modem in question is on a serial port not one of the Equinox
ports.  But when one is chasing ghosts all corners should get investigated...

I would be interested in whatever commentary the net is willing to offer.
There are bunches of XENIX sites out there so I am hoping that somebody may
have experienced this problem.  If there are no experiences with this type
of phenomemon then I have to turn the heat up on Gateway.  My boss keeps
telling me, "But they've burned these machines in, how can we have any
problems....".

   'Tiz a protected mode operating system my friend....'

Any information would, as always, be deeply appreciated.

                            As always,
                            Dr. G.W. Wettstein
                            Oncology Research Division Computing Facility
                            Fargo Clinic / MeritCare

                            UUCP: uunet!plains!wind!greg
                            INTERNET: greg%wind.uucp at plains.nodak.edu
                            Phone: 701-234-2833

`The truest mark of a man's wisdom is his ability to listen to other
 men expound their wisdom.'



More information about the Comp.unix.xenix.sco mailing list