Sun-4 slows down like molasses

Leonard Sitongia sitongia at hao.ucar.edu
Fri Dec 16 02:27:23 AEST 1988


System:         Sun-4/280S
OS:             4.0
Category:       OS/ rlogind (perhaps only indirectly related)
Date:           5 Dec 88

(Other hardware installed in this system include: Sun SCSI and Exabyte
cartridge tape drive, Integrated Solutions (xt interface), two SMD disk
drives and two SCSI disk drives, 3 ALM-II boards, Hyperchannel interface.)

The symptoms of this problem are similar to those described by Loki
Jorgenson @physicsa.mcgill.ca (Vol 7, Issue 33, message 3 of 12) in the
NFS disk wait slowdown problem, but the cause is different.

We have seen a bizarre phenomenon on this machine.  Periodically (happened
three times today) it will appear that the system has died in that
directly connected terminals and rlogins hang up on terminals that are
logged in and when an attempt is made to log in usually there is no
response or maybe login is allowed but no motd is printed (or other stuff
the user may do in the .login) and then the terminal hangs.  The system
appears to have become tremendously busy. {B^D>

It is possible to rsh to this machine in this situation, from other
machines.  In fact, one can "log in" by starting up csh through rsh (rsh
target /bin/csh -i).  From this we can look at what is going on on the
target.

The only unusual behavior is that the in.rlogind's associated with
existing rlogin's are running away, gobbling up lots of cpu time (I mean
lots! very unusual) and the LEDS on the cpu board do their "converging"
pattern, but *very slowly*.  Also, there is often a tremendous number of
system calls per second (on the order of 3-5 THOUSAND!) occurring.  

The number of interupts per second is normal.  The load average is small
(1-5).

We can then disable in.rlogind from inetd.conf and kill all the running
in.rlogin's and the LEDS will go back to their normal speed of
"converging".  In fact, I've killed just about everything on the system,
so that vmstat, iostat, etc. show little activity, 99% idle cpu, typical
numbers of interupts and system calls per second...

...but the system never returns to normal...

...well, I should qualify that: once it returned to normal after about 5
minutes with *no* intervention (on it's own).  Perhaps this was an
unrelated type of slowdown.

So we have to reboot.  Unfortunately, dumps that are generated by breaking
and then typing "g0" at the console monitor *always* have no u-area and
only traceback to the panic (panic 0, the break).

Have others seen this problem?  Does anyone know what is causing this?

We have recieved the 4.0.1 patches but have not installed them.
[[ Do it.  There are quite a few bugs fixed in the upgrade.  --wnl ]]

Thank you for your time,

-Leonard E. Sitongia    System Programmer		 (303) 497-1509
USPS Mail: High Altitude Observatory P.O. Box 3000 Boulder CO  80307
Internet:               sitongia at hao.ucar.edu
SPAN:			NSFGW::"hao.ucar.edu!sitongia"	[NSFGW=9580]



More information about the Comp.sys.sun mailing list