awk bug

John Rupley rupley at arizona.edu
Mon Feb 8 20:49:58 AEST 1988


In article <2161 at ttrdc.UUCP>,  levy at ttrdc.UUCP (Daniel R. Levy) writes:
> In article <3748 at megaron.arizona.edu>, rupley at arizona.edu (John Rupley) writes:
> > In article <672 at pttesac.UUCP>, vanam at pttesac.UUCP (Marnix van Ammers) writes:
> > > While trying to install a new program I ran across a bug in our Sys
> > > V, release 2.1.1 (AT&T 3B20) awk.  In our awk the following pattern
> > > always matches (even if there are 5 or less fields on the current
> > > line):
> > > if $6 != ""
> > > This does not happen on the awk on my 3B1 version 3.51 .
> > > Is this a known bug or what?
> > Could it be a corrupt copy of awk on your release 2 system?
> > The following code excutes properly with my SysV.r2 awk and
> > with the new awk (your 3.51 version?):
> > echo $* | awk '$6 != ""	{print "$6_!=_zerolength", NR, NF, $6}'
> > echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}'
> 
> Alas, I must plead guilty (even though I'm not responsible for awk, I'm still
> a Death-Starian) for awk's behavior in this manner on the 3B20 (we're running
> 2.0v3 here).  It's coming from a dereference of a null pointer (the string
> "f{\0" is present beginning at location zero in a 3B20 process).  

This is a bit off the thread of the awk bug, but if the 3B20 can't 
handle a NULL pointer in awk, how does it handle C code like:
	.
	cmpstr(strchr("abcdef", 'g'), "hijk")
	.
cmpstr(s, t)
char *s, *t;
{
	[standard stuff]
}

> If Rupley
> is using a VAX, on the other hand, everything will seem to be hunkey dorey
> (location 0 in a VAX [System V UNIX] process contains a zero byte, which is
> tantamount to a null string).

Rupley was using an 80286 machine, and more about that below.

> I would posit that, just as when programming in C, testing a field without
> first knowing that it is valid (the field count is high enough) is poor
> programming practice.  I will eat these words if someone can show me awk
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> documentation that says that an undefined positional parameter is guaranteed
> to be null/0 just as an undefined member of an array or previously unused
> variable is guaranteed to be.  

Wow (:-!!  Consider A-K-W, "The AWK Programming Language", A-W 1988, p 192:

'Fields that are explicitly null have the string value ""; they are not
numeric.  Nonexistent fields (i.e., fields past NF) and $0 for blank 
lines are treated this way too.'

The above statement is in the "summary" of the awk language (Appendix 
A). It took only a few minutes and the index to find equivalent 
statements, perhaps a bit clearer, in other sections of the book.

> (I've written many a line of awk code using
> much the same care I would use with C, and never tripped over this problem.)
> Barring such a guarantee, and certainly in the present situation, it is better
> practice, given that one knows that there may be less than six positional
> parameters in an input record, to use
> 
> 	NF >= 6 { action using $6 }
> 
> than it is to use
> 
> 	$6 != "" { action using $6 }

Defensive coding is probably like motherhood.  But perhaps in this case
the mothers lose.  You can create fields within an awk program (eg, 
$5 when NF = 3), and, quoting A-K-W, p36:

"Any intervening fields are created when necessary and given null values."


Back to awk, the 80286, and bugs.  First, there are indeed bugs in awk, 
specifically the new awk.  There have been several postings of new awk 
bugs, with fixes.  I don't think a deficiency (feature?) of the 3B20 
system should be considered an awk bug, however. Second, bringing new 
awk up on an 80286 was a bit unpleasant, owing to the coders' 
assumption that sizeof (int) = sizeof (int *).  Should we be horrified, 
annoyed, or whatever that AT&T, the home of C, assumes all the world's 
a VAX (:-?  Seriously, I do hope that future software sold by AT&T will 
be written to be properly portable.

> |------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
> |         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
> | AT&T Computer Systems Division |  Disclaimer?  Huh?  What disclaimer???
> |--------Skokie, Illinois--------|

John Rupley
 uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local
 internet: rupley!local at megaron.arizona.edu
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929



More information about the Comp.bugs.sys5 mailing list