non-blocking read

Guy Harris guy at rlgvax.UUCP
Mon Feb 20 05:01:31 AEST 1984


> One thing I neglected to mention, is that after clearing the ICANON bit
> in the termio structure, you will want to mask off the sign bit :

> #define CMASK	0377
> int	c;	/* Good idea using int if comparing to constants.	*/
> 	
> 	if( read( 0, (char *)&c, 1 ) )
> 		switch( c & CMASK )
> 		UP: ...

False.  Turning on the ICANON bit has *no effect* on the width of the terminal-
to-computer path in bits.  That is affected by the various ISTRIP bits in
c_iflag and the character width stuff in c_cflag.  You may be confusing this
with RAW mode, which was the only was in V6 UNIX to get characters as they
were typed and which *did* turn off parity checking and parity-bit-stripping
(at least in the later V6-based UNIXes, like PWB/UNIX).  RAW turned off all
that stuff in V7, but in V7 the way to get characters as they are typed is
to turn on CBREAK which has no effect on output (modulo a small Berkeley UNIX
hack) and which only affects canonicalization on input.  ICANON has the same
intent; disable canonicalization and don't do anything else.

Furthermore, declaring "c" as an "int" is a very *bad* idea.  This code
will only work on a machine like those from the 11 family which store the
bytes of a word from the bottom up, i.e.

	+--------+--------+
	| Byte 1 | Byte 0 |	a word
	+--------+--------+
       MSB               LSB

"(char *)&c" always points to "byte 0" of a word.  In the case of an 11 family-
type machine, this means that filling in whatever "(char *)&c" points to will
fill in the lower bits of the word, so if byte 1 is zero this means that "c"
is the zero-extended value of the byte stuffed into "byte 0" of "c".

However, on a machine like most of the other machines in the world (Motorola
M68000, for instance; also note that the DOT Internet standard byte order is
*not* the 11 family byte order!), the bytes of a word are stored from the top
down, i.e.

	+--------+--------+
	| Byte 0 | Byte 1 |	a word
	+--------+--------+
       MSB               LSB

"(char *)&c" still points to byte 0, but this is now the eight *most*
significant bits of the word.  Storing a value where "(char *)&c" points,
assuming byte 1 is still zero, will now produce an integer value which is
the value of the character times 256!

This bug used to exist in early versions of UUCP, and *did* cause us a problem
when putting that UUCP on a non-11 family machine.  The System III UUCP fixed
it by the simple expedient of declaring the variable to be a "char" rather than
an "int".

(For those of you on machines with 4-byte "int"s, this argument still holds,
the diagrams just look a little different.)

Furthermore, the reason it was being suggested was that "char"s may be sign-
extended when compared with "int"s, so a "char" containing the bit pattern
0377 (all bits on) would be sign-extended so that on a machine with 16-bit
"int"s which sign-extends "char"s it actually would be treated as if it had
the value 0177777.  As such, that "char" would *not* be equal to 0377 in
a comparison.  It *would*, however, be equal to '\377', because that
constant is a "char" and would be sign-extended.  The proper solution, unless
you have an older compiler (like, I believe, the V7 PDP-11 C compiler), is
to declare "c" to be an *unsigned* char.

If what you wanted was a variable which holds a small unsigned number (0 to
255), declare it as an "unsigned char".  If what you wanted was a variable
which holds a small *signed* number (-128 to 127), declare it as a "char"
*but* be forewarned that C does *not* guarantee that "char"s are signed.
On the AT&T Technologies 3B series, "char"s are unsigned, so there is no
way to declare something to be an 8-bit signed integer.  If what you wanted
was a variable which holds an ASCII (7-bit) character, declare it as a
"char" because the sign bit will never be on and the question is moot anyway.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy



More information about the Comp.unix mailing list