ie1 problems on Sun 4/280 solved

Hans van Staveren mcvax!cs.vu.nl!sater at uunet.uu.net
Sat Dec 10 00:27:10 AEST 1988


About two months ago we had big problems with the ie1 board on a Sun
4/280, it lost great amounts of packets, and we had the IP queue filling
up, and never emptying again. We ran Sys4-3.2EXPORT and Sun Netherlands
was supposed to figure it out. Well, they didn't, we did.

The first thing I thought of when I saw the symptoms was a race.  I asked
Sun whether the interrupt priority of the board was right, and they
claimed it was. So now two months and a lot of pain later I found out that
the interrupt priority is wrong, although the problem is more subtle then
I originally suspected.

Bear with me, while I go technical for the next three paragraphs:

In the SunOs kernel all networking is supposed to be done at CPU priority
splimp() or higher to prevent devices interrupting critical queue
manipulations. On Sun 3 workstations splimp() is level 3 and ie0 and ie1
also interrupt at level 3, so all is well.  The SPARC chip in the Sun4 has
twice the amount of interrupt levels as the MC68020 in the Sun3, and Sun
made up a way to map the VMEbus interrupt request levels to SPARC
interrupt levels.

It *seems* that all offboard interrupts come in at odd levels(1,3,5,7,..)
and all onboard interrupts at even levels(2,4,6,8,...).  This means that
the onboard ie0 and the offboard ie1 *cannot* interrupt at the same level:
ie0 comes in at level 6, and ie1 at level 5.  On the Sun4 splimp() is
level 6.

Now this still would have worked if inside the interrupt routine from ie1,
running at level 5, a call would have been made to raise the level to 6.
Almost needless to say this call is not there.  The effect of all this is
that while ie1 is queuing packets, ie0 can still interrupt, destroying the
consistency of the system.

End of technical mode.

I am annoyed. I was right within a minute and I had to suffer for two
months and then figure it out myself, without documentation or source.
Does Sun assume all customers are dumb? They could have checked it at
least, I suggested the priority several times as a possible cause.

The strangest thing is that this must have happened to lots of other
people, but a message to this worthy list brought up nothing.

Is there anybody out there who has seen this before?

	Hans van Staveren
	Vrije Universiteit
	Amsterdam, Holland



More information about the Comp.sys.sun mailing list