4.3 Tahoe dump bug
Tait Cyrus
cyrus at pprg.unm.edu
Mon Dec 19 09:25:43 AEST 1988
In the process of trying to get the 4.3 Tahoe dump running on a Sun 3
running SunOS 3.X, I, along with others, have run into the following
bug (feature) (shown below).
>Writing dump file 0 (/research)
> DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988
> DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988
> DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini
> DUMP: mapping (Pass I) [regular files]
> DUMP: mapping (Pass II) [directories]
> DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512
> DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024
> .
> .
> .
> DUMP: (This should not happen)bread from /dev/rxy1g [block 372316]: count=1040, got=1536
> DUMP: (This should not happen)bread from /dev/rxy1g [block 378344]: count=24, got=512
> DUMP: More than 32 block read errors from 152660
> DUMP: This is an unrecoverable error.
> DUMP: NEEDS ATTENTION: Do you want to attempt to continue?: ("yes" or "no") no
> DUMP: The ENTIRE dump is aborted.
This error is produced in dumptraverse.c routine bread. I am having
a difficult time trying to figure out what the heck this routine is
"supposed" to be doing. I say there are several bugs in this routine
and that it should look something like the following:
bread(da, ba, cnt)
daddr_t da;
char *ba;
int cnt;
{
int n;
if (lseek(fi, (long)(da * dev_bsize), 0) < 0){
msg("bread: lseek fails\n");
}
while( cnt ) {
n = read(fi, ba, cnt);
if( n == 0 ) {
msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
disk, da, cnt, n);
broadcast("DUMP IS AILING!\n");
msg("This is an unrecoverable error.\n");
if (!query("Do you want to attempt to continue?")){
dumpabort();
/*NOTREACHED*/
}
}
cnt -= n;
ba += n;
}
}
It currently looks like:
bread(da, ba, cnt)
daddr_t da;
char *ba;
int cnt;
{
int n;
loop:
if (lseek(fi, (long)(da * dev_bsize), 0) < 0){
msg("bread: lseek fails\n");
}
n = read(fi, ba, cnt);
if (n == cnt)
return;
if (da + (cnt / dev_bsize) > fsbtodb(sblock, sblock->fs_size)) {
/*
* Trying to read the final fragment.
*
* NB - dump only works in TP_BSIZE blocks, hence
* rounds `dev_bsize' fragments up to TP_BSIZE pieces.
* It should be smarter about not actually trying to
* read more than it can get, but for the time being
* we punt and scale back the read only when it gets
* us into trouble. (mkm 9/25/83)
*/
cnt -= dev_bsize;
goto loop;
}
msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
disk, da, cnt, n);
if (++breaderrors > BREADEMAX){
msg("More than %d block read errors from %d\n",
BREADEMAX, disk);
broadcast("DUMP IS AILING!\n");
msg("This is an unrecoverable error.\n");
if (!query("Do you want to attempt to continue?")){
dumpabort();
/*NOTREACHED*/
} else
breaderrors = 0;
}
}
Am I misinterpreting what this routine is supposed to be doing?
Will my code work? If not, why?
Thanks
---
W. Tait Cyrus (505) 277-0806 e-mail: cyrus at pprg.unm.edu
University of New Mexico
Dept of ECE - Parallel Processing Research Group
Albuquerque, New Mexico 87131
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list