3B1 Boot Loader Story (long)

Mark Dapoz mdapoz at hybrid.UUCP
Tue Jul 11 13:28:22 AEST 1989


I recently ran across a rather interesting "feechur" of the 3B1 boot loader.
After successfully installing a second hard drive using Gil's instructions,
I decided to replace the stock Miniscribe 6085 with a faster drive.  In doing
so I prepared the new drive (format, verify, allocate, etc.) using it as the
second hard drive and then mounted it and cpio'ed the data to it from the first
drive.  Somehow in doing all this I messed up and managed to allocate the
second partition (the page partition) as 0 blocks.  This meant that the
page partition and user partition both started at the same location on the
disk!  Of course I was already half way through copying the data to the new
disk before I realized this so I immediatly stopped the cpio and reallocated
the drive partitions using iv and mkfs.  I then remounted the drive and started
the cpio all over again.  Once done everthing was fine.  The drive booted when
installed as the primary drive and all was fine..... until the next day.

Sometime in the afternoon there was a power hit and the system was forced to
reboot.  The familiar boot loader message came up and the "#"'s came across the
screen as the kernel was loaded, then nothing.  Hmmm, stick the floppy boot
disks in, mount the drive and all looks fine.  fsck doesn't complain, kernel
looks ok but still doesn't boot, so I restored the kernel from the original
foundation disks as /newkern.  Rebooted from the floppies and specifed /newkern
on the HD as the kernel and up it came with no problems.  Fine, link /unix
to /newkern and reboot.  Same problem appears again and the kernel is hung!
Hmmm, figuring the link must have failed, I rebooted again from floppy and
checked the inode numbers of /unix and /newkern, sure enough they were the
same!  It was now about 5 hours since the machine tried rebooting and I was 
getting rather desperate as to what to do next.  As a last shot before 
reformatting the drive and staring over I decided to use fsdb to look around
the filesystem for anything strange.  All looked fine until I specifed the
page partition as the filesystem to debug (don't ask why, I was desperate :-).
Lo and behold fsdb found a filesystem and began to show me files!  What, a 
filesystem on a page partition!  Yes, it was my original filesystem that I 
created the first time around.  When I used iv to rebuild the partitions it 
didn't remove any of the data so it was still there mostly intact.  

Now it's all starting to come together, it seems the boot loader looks at all 
the partitions on the drive, one at a time, looking for the name of the 
kernel you specified.  Since the default kernel name is /unix, it found this 
name on the invalid filesystem on the page partition and tried loading it.  Of 
course when I first built the system most of the data for the filesystem on 
the page partition was still intact because I hadn't had enough activity to 
cause paging to occur.  But overnight the news expire probably caused paging 
which overwrote the data for /unix, but NOT the superblock for the filesystem.
So you see, the boot loader ended up reading a very bad copy of /unix on the 
wrong filesystem which completely overrode the /unix on my user filesystem.
Once I invalidated the superbock on the page partition using fsdb all was
working fine again.  Ah, the joys of a 3B1....... :-)
-- 
  Mark Dapoz  (mdapoz at hybrid.UUCP)  ...uunet!{mnetor,dptcdc}!hybrid!mdapoz

I remind you that humans are only a tiny minority in this galaxy.
	   -- Spock, "The Apple," stardate 3715.6.



More information about the Unix-pc.general mailing list