disk timeout, SCSI reset...

Sat Apr 6 08:56:00 AEST 1991

PROBLEM SUMMARY:
---------------
WREN VI hard drive as "dks0d2".

    sc0,2,0: timeout after 30 sec.  Resetting SCSI BUS
    dksc0d2s7: retrying request
    dksc0d2s7: retrying request
    dksc0d2s7: retrying request

    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: sc0,2,0: timeout after 30 sec.  Res

    bru "filename" warning - X block checksum error
    bru "filename2" warning - block sequence error
    bru "filename3" warning - file synchronization error - attempting recovery.

BACKGROUND LEADING TO PROBLEM:
-----------------------------
We received a Seagate Wren VI drive from PARITY systems, already formatted
and partitioned for our Personnal IRIS (4D/35).  It was even setup to be
used as disk "2" (dks0d2).

I installed it on the PI, used "fx" to be sure it was partitioned.  It
is partitioned in the same way SGI partitions their (16 MB for root,
50 for swap, and the rest for "/usr").

Partition 7 representing the entire area ("/" + swap + "/usr"), I used

    mkfs /dev/dsk/dks0d2s7

That worked fine.  I then issued

    ln /dev/dsk/dks0d2s7 /dev/usr2

and the same for the raw device.

    mount /dev/usr2 /usr2

was also successfull, as was writing small files to the disk.

I then wanted to move all users from /usr/people to /usr2/people.  For
some reason, I felt like using bru, so I issued

    cd /usr
    bru -cvf /usr2/bru.dat people

This worked fine and created a file of 77 MB.

    cd /usr2
    bru -xvf bru.dat

started normally, BUT at several intervals (at 13, 21, 23, 30.6, 30.8, 32.3,
38.4, 47.5, 59, 65.3, 75.9 Megabytes), the extraction from the file stopped,
then when it started again the messages

    sc0,2,0: timeout after 30 sec.  Resetting SCSI BUS
    dksc0d2s7: retrying request
    dksc0d2s7: retrying request
    dksc0d2s7: retrying request

would appear on the console.  The /usr/adm/SYSLOG file (the last 20 lines)
looks like:

    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: sc0,2,0: timeout after 30 sec.  Res
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: sc0,2,0: timeout after 30 sec.  Res
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: sc0,2,0: timeout after 30 sec.  Res
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: sc0,2,0: timeout after 30 sec.  Res
    Apr  5 16:27:24 nrcbs3 grcond[471]: CIO: dks0d2s7: retrying request

ALSO, as bru was extracting from the file "bru.dat", the following messages
would should up at irregular intervals (and not necessarily in the following
order):

    bru "filename" warning - X block checksum error
    bru "filename2" warning - block sequence error
    bru "filename3" warning - file synchronization error - attempting recovery.

*** The same SCSI timeout errors happened when I used
***
***     cp -r /usr/people/* /usr2/people/
***

The hard disk is "auto-terminating", so it does not need a SCSI terminator.

That system is on one of our satellite campuses, so it's hard to keep carrying
equipement back and forth (i.e. carry the disk here and try on our own PI,
or come back here and get another cable, or terminator, or anything else...)

What is wrong??  anyone have any clue??  Any suggestions??

Thank you for your suggestions,

     Claude Cantin
     National Reasearch Council