The 3B1 and the Bad Block

Thad P Floryan thad at cup.portal.com
Sun Jan 13 19:36:28 AEST 1991


jeff at cjsa.wa.com (Jeffery Small) in <1991Jan12.014524.300 at cjsa.wa.com> writes:

	I have been getting a number of the following HDERR reports in
	/usr/adm/unix.log (long lines have been wrapped for readability):

[A]	HDERR ST:51 EF:40 CL:E3 CH:3 SN:7 SC:1 SDH:22 DMACNT:FFFF DCRREG:9A
	MCRREG:8F00 Thu Dec 27 11:11:00 1990

[B]	WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=995. sc=7. hd=2. dr#=0. MCR2:0x0
	Thu Dec 27 11:11:04 1990
    
[C]	drv:0 part:2 blk:58635 rpts:1 Fri Jan 11 07:19:14 1991

	So my questions are:

	1:  Why didn't the block number in the error report (58635) work?  What
	    (probably obvious) idea am I missing and how should I properly fix
	    this problem?

The "blk:58635" is with respect to the THIRD HD partition "part:2" (counting
from 0).  You need to convert that to a block number with respect to the
beginning of the HD for use with the bad-block mapper in s4diag.

I have a program "hdhelp" that performs the calculation of the "real" block
number given items [A] and/or [C] above from /usr/adm/unix.log along with the
"-t" partitioning report data from "iv".  The program is still in an "ALPHA"
stage because I'm still "playing" with it to handle other error reports and to
report the bad block in several different forms including byte-offset.

And I have a thought "hdhelp" might be eventually adapted to "correct" the
problem (by mapping out bad block(s)) "online" ... but there are potential
nasties with this on a mounted file system and I haven't yet given any thought
concerning strategy(ies) for doing this.

To get the partition information needed, you can run "iv" su'd online thusly:

	# iv  -t  /dev/rfp000

or you can request the same information from one of s4diag's menus.

I've included a "shar" of the present version 0.1 "hdhelp" at the end of this
posting since it IS useful in its present form.  NO DOCS are yet available but
it should be easy to follow the code and the comments.  I've already used it
to calculate the bad blocks to be mapped-out on 5 systems, but the program will
change considerably before the version 1.0 "official" release.  One note: the
program should be compiled and run on a system OTHER than the one with the
problem!  :-)   I believe it'll even compile and run with the C on my C-64.

	2:  Did I just (potentially) hose some disk files by entering the two
	    good sectors into the BBT or did the contents of these sectors get
	    copied to the alternate track by the diagnostic routine?  If the
	    data was not copied, is there a way that I could determine the
	    files (if any) which were damaged?

Yep, you hosed 'em real good!  I seriously doubt the s4diag bad-block mapper
copies or zaps anything, so the information in the original blocks should
still be intact.  Given that you already know how to run "bf":

	So, I used Brant Cheikes' great "bf" program to determine that block
	number 58635 was allocated to inode #2583.  ncheck then told me that
	this was currently assigned to my 1Mb-sized Cnews "history" file.

you could do the same thing to determine the file(s) to which the blocks you
did map out were assigned.  In this case, you have to convert the partition 0
block number (which is what s4diag uses) to a partition 2 block number for bf
to do its thing.  This calculation is the INVERSE of what "hdhelp" does.  I
don't know if you have to un-bad-block-map the two "good" blocks you canned,
but you can try it both ways.

	3:  Can I use the same diagnostic routine to recover the use of these
	    two sectors by "Deleting" them from the BBT?  If not, what is the
	    "Delete" option for?

Yes, you "should" be able to recover the two erroneously mapped-out blocks
using the "Delete" option.

One thing that "bothers" me with your posting is that you didn't indicate
whether you mapped out both sectors of a logical block or whether you just
mapped out by single sector.  If you simply use "Delete" to undo what you
originally did, you should be OK.  But keep in mind that a 1K logical block
on the 3B1 comprises two sectors (physical 512-byte blocks).

Thad Floryan [ thad at cup.portal.com ]

---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 01/13/1991 08:17 UTC by thad at thadlabs
# Source directory /u/thad/temp
#
# existing files will NOT be overwritten unless -c is specified
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#    291 -rw-r--r-- Makefile
#   6512 -rw-r--r-- hdhelp.c
#
if touch 2>&1 | fgrep 'amc' > /dev/null
 then TOUCH=touch
 else TOUCH=true
fi
# ============= Makefile ==============
if test -f 'Makefile' -a X"$1" != X"-c"; then
	echo 'x - skipping Makefile (File already exists)'
else
echo 'x - extracting Makefile (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'Makefile' &&
X# 3B1 makefile for hdhelp
X#
XCC	=	cc
XCFLAGS	=	-O
XLDFLAGS	=	-s
XLIBS	=	/lib/crt0s.o /lib/shlib.ifile
XNAME	=	hdhelp
XOBJS	=	hdhelp.o
XDEST	=	/usr/local/bin
X
X$(NAME)	:	$(OBJS)
X		$(LD) $(LDFLAGS) -o $(NAME) $(OBJS) $(LIBS)
X
Xinstall :	$(NAME)
X		mv $(NAME)  $(DEST)/.
X
Xclean	:
X		rm -f $(OBJS) core *~
SHAR_EOF
$TOUCH -am 0113001591 'Makefile' &&
chmod 0644 Makefile ||
echo 'restore of Makefile failed'
Wc_c="`wc -c < 'Makefile'`"
test 291 -eq "$Wc_c" ||
	echo 'Makefile: original size 291, current size' "$Wc_c"
fi
# ============= hdhelp.c ==============
if test -f 'hdhelp.c' -a X"$1" != X"-c"; then
	echo 'x - skipping hdhelp.c (File already exists)'
else
echo 'x - extracting hdhelp.c (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'hdhelp.c' &&
X/*	hdhelp
X *
X *	This program helps identify the bad block(s) reported in the file
X *	/usr/adm/unix.log and/or on the screen of the UNIXPC/3B1/PC7300.
X *
X *	Usage:
X *
X *		hdhelp  [ -# ]
X *
X *	Where:	# = method number if both are not desired
X *
X *	The format of HD errors with kernels up to and including 3.51a is:
X *
X *		HDERR ST:11 EF:40 CL:4241 CH:4201 SN:420C SC:4202 SDH:4223 \
X *		DMACNT:FFFF DCRREG:93 MCRREG:8100 Tue Dec 27 02:23:51 1988
X *
X *		drv:0 part:2 blk:15510 rpts:1 Tue Dec 27 02:23:53 1988
X *
X *	The bad block can be calculated using two methods, each as a check
X *	on the other, depending on the available data.
X *
X *	The first method uses the ...
X */
X
X#include <stdio.h>
X
Xstatic char *version = "@(#) hdhelp 0.1 Thad Floryan 17-Oct-1990";
X
Xmain(argc, argv)
X	int	argc;
X	char	*argv[];
X{
X	extern int scanf();
X
X	int	method1 = 0;
X	int	method2 = 0;
X	int	choice;
X	int	CL;	/* Cylinder LOW: only lower byte is significant */
X	int	CH;	/* Cylinder HIGH: only lower byte is significant */
X	int	SN;	/* Sector Number: only lower byte is significant */
X	int	SDH;	/* Head Number: only lower nybble is significant */
X	int	num_heads;		/* number of HD heads */
X	int	part_num;		/* current partition number */
X	int	block_num;		/* HD block number */
X	int	track;			/* HD track number */
X	int	part_blocks[17];	/* partition data from s4test DIAG */
X	int	part_index;		/* subscript for part_blocks[] */
X	int	block1 = 0;		/* method 1 results */
X	int	sector1 = 0;		/* method 1 results */
X	int	block2 = 0;		/* method 2 results */
X	int	sector2 = 0;		/* method 2 results */
X	int	blocks_per_track = 8;	/* UNIXPC has eight 1024-byte blocks */
X					/* same as sixteen 512-byte sectors */
X					/* with one spare per track */
X	int	sectors_per_block = 2;	/* UNIXPC with 1K file system (std) */
X
X	if (argc == 1)
X	{
X		method1 = method2 = 1;
X	}
X	else if (argc == 2)
X	{
X		choice = -atoi(argv[1]);
X		if (choice == 1)
X		{
X			method1 = 1;
X		}
X		else if (choice == 2)
X		{
X			method2 = 1;
X		}
X		else
X		{
X			DoUsage(argv[0]);
X		}
X	}
X	else
X	{
X		DoUsage(argv[0]);
X	}
X
X	printf(
X"You will be asked to supply several values from the HD error report found\n")
;
X	printf(
X"in /usr/adm/unix.log and/or from the s4test DIAG report; enter each value\n")
;
X	printf(
X"followed by a RETURN.  If the data available is only that which appears\n");
X	printf(
X"on your UNIXPC's screen, select method 1.  Be SURE to read this program's\n")
;
X	printf(
X"accompanying documentation!  You use this program at your own risk.  The\n");
X	printf(
X"program's author believes this program to be correct, but, in ALL cases,\n");
X	printf(
X"you, the user, are responsible for the (mis)use, (mis)interpretation, and\n")
;
X	printf(
X"(mis)application of this program's calculations.  Be forewarned!\n");
X
X	PromptDec("\n\tNumber of HD heads? ", &num_heads);
X
X	if (method1 != 0)
X	{
X		printf("\nMETHOD 1 DATA INPUT:\n\n");
X
X		printf(
X"The values for the next 4 inputs can be found in the /usr/adm/unix.log\n");
X		printf(
X"on the long line which begins \"HDERR ST: ...\"\n\n");
X
X		PromptHex("\tvalue of  CL:", &CL);
X		PromptHex("\tvalue of  CH:", &CH);
X		PromptHex("\tvalue of  SN:", &SN);
X		PromptHex("\tvalue of SDH:", &SDH);
X
X		block1  =   (CH  & 0xFF) * 256 * num_heads * blocks_per_track
X			  + (CL  & 0xFF)       * num_heads * blocks_per_track
X			  + (SDH & 0x0F)                   * blocks_per_track
X			  + ((SN & 0xFF) >> 1);
X
X		sector1 =   (CH  & 0xFF) * 256 * num_heads * blocks_per_track
X			  + (CL  & 0xFF)       * num_heads * blocks_per_track
X			  + (SDH & 0x0F)                   * blocks_per_track;
X		sector1 *=  sectors_per_block;
X		sector1 +=  (SN  & 0xFF);
X	}
X
X	if (method2 != 0)
X	{
X		printf("\nMETHOD 2 DATA INPUT:\n\n");
X
X		printf(
X"The values for the next 2 inputs can be found in the /usr/adm/unix.log\n");
X		printf(
X"on the line which looks like \"drv:0 part:2 blk:25916 rpts:1 ...\"\n");
X		printf(
X"The prompt calculations are assuming %d heads as previously entered.\n\n",
X			num_heads);
X
X		PromptDec("\tpart:", &part_num);
X		PromptDec("\t blk:", &block_num);
X
X		printf(
X"\nThe values for the next %d inputs are from the s4test DIAG disk report\n\n"
,
X			part_num + 1);
X
X/*
X *	Ask for one more partition than needed just so no-one feels
X *	queasy about not entering everything on the s4test report.
X *	Believe me, this is important user psychology.
X */
X		for (part_index=track=0; part_index <= part_num; part_index++)
X		{
X			printf("\tPartition %d: start Track=%d, ",
X				part_index, track);
X
X			PromptDec("size (in Blocks)=",
X				&part_blocks[part_index]);
X
X			track += (part_blocks[part_index] / blocks_per_track);
X
X			if (part_index < part_num)
X			{
X				block2 += part_blocks[part_index];
X			}
X		}
X		block2 += block_num;
X		sector2 = block2 * sectors_per_block;
X	}
X
X	if (method1 != 0)
X	{
X		printf("\nMETHOD 1 RESULTS:\n\n");
X		printf("For a HD with %d heads and error report per ",
X			num_heads);
X		printf("\"CL:%04X CH:%04X SN:%04X SDH:%04X\"\n\n",
X			CL, CH, SN, SDH);
X		printf("\tThe partition 0 block number is %d\n", block1);
X		printf("\tThe partition 0 sector number is %d\n", sector1);
X	}
X
X	if (method2 != 0)
X	{
X		printf("\nMETHOD 2 RESULTS:\n\n");
X		printf(
X"For a HD error on \"part:%d blk:%d\" and partitioned per:\n",
X			part_num, block_num);
X		for (part_index=track=0; part_index <= part_num; part_index++)
X		{
X			printf(
X"\tPartition %d: start Track=%d, size (in Blocks)=%d\n",
X				part_index, track, part_blocks[part_index]);
X			track += (part_blocks[part_index] / blocks_per_track);
X		}
X		printf("\n\tThe partition 0 block number is %d\n", block2);
X		printf("\tThe partition 0 sector numbers are %d and %d\n",
X			sector2, sector2 + 1);
X	}
X
X	if (method1 != 0 && method2 != 0)
X	{
X		if ((block1 == block2) &&
X			((sector1 == sector2) || (sector1 == sector2 + 1)))
X		{
X			printf(
X"\nThe two methods concur, so you can proceed per the documentation.\n");
X		}
X		else
X		{
X			printf(
X"\nThe values for the blocks disagree; please check your data input.\n");
X		}
X	}
X}
X
X
XPromptDec(msg, val)
X	char	*msg;
X	int	*val;
X{
X	extern int strlen();
X
X	char	inbuf[81];
X
X	printf(msg);
X	fgets(inbuf, 80, stdin);
X	inbuf[strlen(inbuf) - 1] = '\0';	/* null out newline */
X	sscanf(inbuf, "%d", val);
X}
X
X
XPromptHex(msg, val)
X	char	*msg;
X	int	*val;
X{
X	extern int strlen();
X
X	char	inbuf[81];
X
X	printf(msg);
X	fgets(inbuf, 80, stdin);
X	inbuf[strlen(inbuf) - 1] = '\0';	/* null out newline */
X	sscanf(inbuf, "%x", val);
X}
X
X
XDoUsage(pname)
X	char	*pname;
X{
X	printf("usage: %s  [ -# ]\n", pname);
X	printf("where: # is either 1 or 2; see program docs\n%s\n", version+5);
X	exit(1);
X}
SHAR_EOF
$TOUCH -am 0113001591 'hdhelp.c' &&
chmod 0644 hdhelp.c ||
echo 'restore of hdhelp.c failed'
Wc_c="`wc -c < 'hdhelp.c'`"
test 6512 -eq "$Wc_c" ||
	echo 'hdhelp.c: original size 6512, current size' "$Wc_c"
fi
exit 0



More information about the Comp.sys.att mailing list