possible head crash :-(

John Mcmillan jcm at mtune.ATT.COM
Sat Jan 6 04:09:59 AEST 1990


In article <1134 at ursa-major.SPDCC.COM> gst at ursa-major.spdcc.COM (Gary S. Trujillo) writes:
  >Here's an update on the situation I described recently.
:
  >1. What's down there at block 2, which fsck reports it gets a read
  >   error on?  I would have assumed that the bootstrap loader (or
  >   whatever it's called) lives there.

        NOPE!  The bootstrap loader is in the low parts of /dev/rfp000.
        You should be running FSCK on /dev/[r]fp002.

        Logical (1K-byte) BLOCK#2 is the 1st INODE BLOCK.
                /dev/rfp002-  LOGBLK#0 = 1st 1/2 empty, 2nd 1/2==FS SuperBlock
                /dev/rfp002-  LOGBLK#1 = reserved
                /dev/rfp002-  LOGBLK#2 = 1st INODE BLOCK

        Following is A SAMPLE of the use of the 1st INODE BLOCK
        -- my /dev/rfp002
          ("ncheck -i 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /dev/rfp002"):

        inode
        disk    inode
        index   number  file 'name'
                        ------------------ (1K-byte) block  #2
                        --------- (512-byte) block  #4
        0       -       <reserved>
        1       2       /.
        2       3       /bin/.
        3       4       /usr/lib/terminfo/d/dw4
        4       5       /usr/lib/terminfo/h/.
        5       6       /usr/lib/terminfo/h/hp262x
        6       7       /etc/.
        7       8       /usr/lib/terminfo/i/.

                        --------- (512-byte) block  #5
        8       9       /usr/lib/terminfo/i/intext
        9       10      /etc/lddrv/unix.sym
        10      11      /tmp/EXPORTS
        11      12      /dev/.
        12      13      /usr/lib/terminfo/i/intext2
                        /usr/lib/terminfo/i/intextii
        13      14      /usr/lib/terminfo/l/.
        14      15      /usr/lib/terminfo/l/lp
                        /usr/lib/terminfo/l/lpr
                        /usr/lib/terminfo/p/print
                        /usr/lib/terminfo/p/printer
                        /usr/lib/terminfo/p/printing
        15      16      /lib/.

  >2. What are the chances I might be able to re-write the boot loader
  >   with the "ldrcpy" utility, which is in the /etc directory on the
  >   "floppy file system disk" (3 of 12)?  There are some lines in
  >   /etc/profile  on this floppy which say:
:
        STOP!  Why are you doing this?!?!?!?  There must be more
        pleasurable ways of inflicting pain on yourself!?

        1) You SEEM to have an INTERMITTENT READ FAILURE.
                a) You CAN mount & LS the File-System, therefore
                        LOGICAL-BLOCK#2 & inode#2 CAN be read
                        some of the time.
                b) Yet you cannot reliably FSCK it.
                [ Note:
                        i)   Are you doing: fsck ... /dev/rfp002?
                                                          ^^^^^^
                                This MAY skip the re-try mechanism.
                        ii)  If so, "fsck ... /dev/fp002;reboot"
                                                   ^^^^^
                                might at least 'work' -- but it
                                is unlikely to FIX the problem.
                ]

        2) I didn't see anything that suggested the LOADER was
                damaged.  More likely, it just can't cope with
                intermittent read errors in trying to locate "/unix".

  >3. It turns out the file system is *not* completely OK.  I just ran
  >   an "ls -R /" for the bittersweet fun of it, and found there were
  >   several files (~ 10) which were reported "not found," amid the
  >   buzzing of recal attempts.  I assume the inodes of these files
  >   are inaccessible due to the damage which I now feel justified
  >   in imagining has taken place.

        Right.
        "ls /" just needs to read the DIRECTORY entries for inode#2.
        "ls -R /" or "ls -l /" needs to examine the INODE entry
                for each DIRECTORY entry.

  >4. Speaking of damage, I suddenly realized that I might be wise to
  >   not run the unit too long, since, if there was a head crash, there's
  >   probably oxide flying around inside, which is not good on heads
  >   which are designed to float really, really close to the surface of
  >   the disk platters.

        You probably DID NOT suffer a head crash: you probably vibrated
        the head to an unacceptable clearance while it was writing --
        thus making the signal inadequate.

  >5. Here's a puzzler for you all-- given the conditions I've described,
  >   how can I get files off the machine? Here's what I have:
  >
  >   A. A mostly-readable hard drive which I can't run a multi-user
        [^^^^^^^^^^^^^^^ -- an unproven statement as I read your notes.]
  >      system from at present; the only way I can run UNIX is from
  >      the (writable) floppy filesystem disk I made quite some time
  >      ago in anticipation of major problems like the present ones.
  >      I cannot boot from the hard drive even starting from the boot
  >      floppy (I tried).  I can't seem to write to the drive.
  >
  >   B. A 7300 with a 10meg drive (mostly filled, but I can probably
  >      get ~4K blocks free if I work at it).

If you can put a compiler thereon, there's a great deal of test-codes
you can build IF it comes down to that... but that's for LATER on.
(You do NOT need libraries... just /lib/crts0.o and the '/lib/*ifile*'
stuff, I believe.)

:
  >7. Is there any point in attempting to reformat the drive, once I've
  >   gotten as much as I can from it, do you think, or would you just
  >   leave it in its current state.  (I know it's really up to me, but
  >   I'd sort of like to know what you'd do and why.)

Yes: reformatting is likely to recover disk.  But WAIT, unless you
WANT to discard your data.

  >8. What do you guess are my chances of saving the drive?

Pretty good -- if you'll SLOW DOWN !-)
        (Amongst other things: [almost] NEVER try to construct
        files on a damaged system -- until FSCK has successfully run.)

I'd start with verifying your FSCK data:
^^^
        umount /dev/fp002 ; dd bs=1024 count=64 < /dev/fp002 > /dev/null
                                                       ^^^^^
                I'd expect the above to either succeed -- given the
                apparently intermittent nature of the problem --
                or to fail with a note that 2 blocks were copied.
                                            ^

                If the above succeeds -- but with recalibration noises
                -- I'd be tempted to try a correction-by-rewriting.
                   ^^^
                (I'm tired of posting just "why" this often works.)

        umount /dev/fp002 ; dd bs=1024 count=64 < /dev/fp002 > /dev/fp002
                                                       ^^^^^        ^^^^^

                If THIS works... it's time to try the FSCK again!

If the above DOESN'T WORK, or there are other issues, E-mail me.
There are too many scenarios to describe.  (Before contacting me,
run "dd bs=512 count=128 < /dev/rfp002 > /dev/null" and inform me of results.
                                ^
Also, verify you specifically followed "/dev/fp00x" and "/dev/rfp00x" details!)

  :

john mcmillan -- att!mtune!jcm -- Speaking for SELF, not AT&T



More information about the Comp.sys.att mailing list