Sun File System Problems

Ted Kyriakakis ontologic!tpk at uunet.uu.net
Sun Dec 3 06:51:34 AEST 1989


For the past week we have been having file system corruption problems on
our mail gateway system.  The system is a 3/180 with 2 Fujitsu 451 disk
controllers (the first controller has an eagle and a cdc 9771 drive, the
second controller has a super-eagle), and a mag tape drive.  It has been
running SunOS 4.0.3 for several months and supports several diskless
clients.

Our problems curiously only seem to happen during the weekend (the past
two).  What's been happening is that the xy0 (eagle) disk seems to be
getting munged.  The operating system will report generic "I/O errors", we
reboot the system, and the subsequent fsck produces DUP block errors on
practically every partition on xy0.   Occasionally we will also get an xy1
"offline" error, but the xy1 file systems are fine.  The disk on the
second controller never has problems.

When I first looked into the matter, I discovered that vmunix had been
modified (corrupted?).  I am not sure whether this is just another symptom
due to the file system damage or whether this may be the cause.  We now
keep a spare vmunix on another disk and use it to compare against vmunix
as a flag to notify us that the system is about to or is having problems.
As long as someone is around, this helps us minimize the file system
corruption that occurs. 

But we still do not know what is causing us the problem.  I can think of
three possible general causes:

1) hardware (CPU, xy0 disk, the disk controller, or the disk cables): But
we have not been getting any hardware errors being reported and the disk
drive has not been indicating any faults.  I would think this type of
problem would cause more regular (daily) problems.

2) OS software problem (SunOS 4.0.3 bug): But we were running for about 3
months without any noticeable problems.

3) virus or outside break-in: But it has not spread to any of our other
hosts which would be quite simple once the mail host was breached.

Or it could be a combination of the above or one of the above which is
precipitated by some other external factor such as a power surge.  As you
can probably tell, I am grasping for straws.  If you have experienced
similar problems, or know of problems with SunOS 4.0.3 which could be the
cause, or know of any viruses or break-ins going on which have exhibited
similar symptoms, please let me know.  I will summarize the responses to
the newsgroup if interest warrants.

In the meantime, we will be waiting and watching to see if we can pinpoint
the circumstances surrounding the start of the problem.  And if that fails
to turn up anything, I guess we will start switching out the hardware
and/or going back to the previous version of the SunOS.



More information about the Comp.sys.sun mailing list