Obscure problem/bug with init on 3B1's

Mark Dapoz mdapoz at hybrid.uucp
Wed May 9 15:16:41 AEST 1990


Here's yet another story of an obscure problem I just came across on my 3B1.

First the situation.  When I arrived home from work today I noticed that the
DTR light on the 'blazer was off and my 3B1 wasn't answering any incomming
calls.  Upon doing some initial investigation I found that I no longer had
a uugetty process running on the modem port.  This struck me as strange 
since init is responsible for restarting the uugetty and there was no
indication of problems such as "process respawning too quickly" (and I
haven't touched uugetty or inittab in quite a while now).  I tried sending
init a signal to switch to run level 2 (which it should already be in) but
that had no effect, it seemed init was somehow hung.  I was in a bit of a
rush to get out tonight (a sure sign that something is going to go wrong :-)
so I just decided to reboot the machine in the hopes that it would fix
itself.  Well, when I issued a shutdown command the system didn't even
attempt to shut down, thus confirming my suspicion that init was dead.  I
did a manual shutdown and managed to bring the system down in an orderly
fashion.  However, upon rebooting the system failed to initialise completely
and it seemed to get hung just after init was started.  Wonderful, now I
have a completely dead system instead of one with just a dead init.  Over the
next 2.5 hours I then tried every possible remedy to get it to completely boot 
up.  I installed a new init from the distribution disks, recreated the inittab,
went back to previous known working kernels, installed a backup copy of the 
shared library, ran every possible hardware test, started pulling cards, etc. 
but nothing worked.  The system just kept stopping once init was started.  If 
I removed init, then upon booting I would get a shell so the kernel seemed to 
be ok, its just init was very sick.  I ended up digging through the man pages
for init to figure out exactly what it did upon boot (maybe I was missing
something all these years).  The man page makes a reference to /etc/wtmp
and /etc/utmp as files used by init to log information.  I then checked these
files and found, to my surprise, a file called /etc/utmp.lck!  Ah ha, a lock
file!  It seems init at some point created a lock file for utmp and it never
removed it.  It also seems that init isn't quite bright enough to know that
it should remove this file upon boot so it just happily sat there waiting
for it to disappear.  Of course once I removed the lock file my system booted
quite happily.  Now, why would init ever create a lck file and why doesn't
it know about removing it when the system boots up.  It was quite frustrating
to spend 3 hours on a floppy based unix digging around to find this.  I hope
this experience may help someone else if they ever have the misfortune of
getting into such a situation.

-- 
Managing a software development team 	|   Mark Dapoz  
is a lot like being on the psychiatric	|   mdapoz%hybrid at cs.toronto.edu
ward.  -Mitch Kapor, San Jose Mercury	|   ...uunet!mnetor!hybrid!mdapoz



More information about the Comp.sys.att mailing list