MCP & Rimfire Incompatibility?

Roger Burroughes roger at cogsci.ed.ac.uk
Tue Feb 19 05:43:00 AEST 1991


PROBLEM:

We have an apparent incompatibility between an MCP board and a Ciprico
Rimfire 3220 SMD Disk Controller (supporting a CDC Sabre 1.2Gb disk) on a
Sun 4/330 running SunOS 4.0.3 - causing repeated "panic: Text fault ...
BAD TRAP" crashes.

DETAILED DESCRIPTION:

The original set-up was all Sun stuff (i.e internal disk plus MCP
hardware) which ran reasonably smoothly. A few weeks later, the Rimfire
controller and associated software was added (7.Dec.89), and all seemed
well. Shortly after this (19.Dec.89), I got round to installing the MCP
software (version 6.0), together with X.25 software (version 6.0) and Sun
Coloured Book software (version 2.0.1) - again, all seemed to run smoothly
after this.

Our first related error occurred on 5.Jan.90 when we lost external comms
with the error message "mcph0: xmit hung..." and - at the same time - a
single "rfintr: Hard error..." report, which wasn't repeated, so we put it
down to a stray cosmic ray :-).

	Our first crash happened on 8.Jan.90, with:

	le0: Transmission stopped
	le0: csr: 2e3<TINT,...
	BAD TRAP
	rlogin: Text fault
	kernel read fault...

At this point, Sun suggested that we might need a newer revision level CPU
(as an aside, the Lance Ethernet bug seen at the same time was later
fixed). I believe the Sun was in the middle of dumping to a remote Exabyte
drive when it crashed. Next crash was on 10.Jan.90, again while dumping to
Exabyte:

	panic: Text fault
	BAD TRAP
	nfsd: Text fault
	kernel read fault at addr=0x0,...

	[A possible red herring emerged at this point, since we had
	another 4/330 which crashed with similar errors:

		BAD TRAP
		nfsd: Data fault
		kernel read fault at addr=0x1d2c,...

	- THIS 4/330 had a Rimfire controller (3223/3224) plus disks, but 
	*NO* MCP board.]

The system was now crashing every day or two, and there appeared to be a
high - but not exclusive - correlation between crashes and dumps.  Sent
core dump to Sun, who could find no obvious problem - some SLIGHT
indication that the problem lies with the MCP/X.25 software plus a
TENTATIVE suggestion that the MCP board may not work well on a 16MIPS
machine. 

Sun engineer checked over hardware (CPU & MCP boards), moved MCP board to
slot 3, Rimfire controller to slot 2, and installed extra 16Mb memory
(25.Jan.90). Had Rimfire controller replaced (8.Feb.90) as well as CPU
board, but crashes still continued - so we assumed that the problem wasn't
due to faulty hardware. 

Only one thing left to do - remove all third party hardware (i.e.  Rimfire
controller and disk). This was done on 6.Mar.90 - and the situation
improved. No more crashes. The next thing Sun suggested was to reinstall
the Rimfire controller & disk, and remove the MCP board - however this was
not a viable option with our setup, and so was not done. We did not,
unfortunately, try the slightly newer Rimfire controller (3223/3224) that
we had on another machine

The system has been reasonably stable in the MCP-but-no-Rimfire
configuration, so we accepted the situation and left well alone. However,
we now want to add an extra medium/large disk to the machine in question,
and the obvious thing to do seems to be to re-install the Rimfire
controller and add a disk to that - we'd rather not have to buy another
SCSI disk. Has anyone seen similar interaction problems between an MCP
board and a Rimfire controller? Anyone found a fix? Does anyone know if
newer versions of hardware and/or software have cured this problem?

Please reply by email, and I'll summarise if there's any interest.

Roger Burroughes   Phone: +44 31 650 4447       | University of Edinburgh
UUCP:   ...!uunet!mcvax!ukc!its63b!cogsci!roger | Centre for Cognitive Science
ARPA:   roger%cogsci.ed.ac.uk at nsfnet-relay.ac.uk| 2 Buccleuch Place
JANET:  roger at uk.ac.ed.cogsci                   | Edinburgh EH8 9LW Scotland



More information about the Comp.sys.sun mailing list