Randomly-Signed Character Variables

Henry Spencer henry at utzoo.UUCP
Thu Sep 13 06:01:38 AEST 1984


> ...  what will
> happen if char variables are randomly sign-extended?  In other words, does
> a portable program assume that char variables are consistent in their
> sign-extension?

Interesting question.  One can argue (I have been heard to do so) that
if a program is to be portable, it can use char variables for only two
things:  (1) characters, which are guaranteed non-negative by C, and
(2) small non-negative integers.  If a program is portable in this
fairly-strong sense, there's no problem because the top bit is never on
and the sign-extension behavior is irrelevant.

One place where I would foresee problems is in things like hashing and
checksums.  I have been known to write code which stated, in a comment,
"doesn't matter whether chars are signed or not, but it better be
consistent!".  I never analyzed the programs deeply to determine whether
there really would be problems, but there was obviously enough rope there
to hang oneself with.

I guess my overall reaction is that there's a good chance that inconsistent
sign extension wouldn't foul up too many things, but I would hate to have
to bet money on it.

The current draft of the ANSI standard says:

	... If [things other than `ordinary' characters] are stored
	in a char object, the behavior is implementation-defined: the
	values may be treated as either signed or non-negative integers.
	[Section 2.2.5, draft of 21 Aug 1984]

	Implementation-defined behavior -- behavior that depends on the
	characteristics of the implementation and that must be documented
	for each implementation.  [Section 1.1, draft of 21 Aug 1984]

The wording could probably be improved, but the current version seems to
say that you had better be able to document just how your chars behave,
rather than just saying that sign extension occurs or doesn't occur at
random.  (Note that compiler optimizations etc. may alter the exact form
used to access a character variable, so the source code isn't a reliable
guide unless the compiler is very careful.)

> Note that if consistency is desired, the "most optimal" choice will vary
> with the application.  If lots of references are made to char variables
> via pointers, the choice will be sign-extended chars; if lots of references
> are made to ordinary variables (or anything requiring an offset from a
> pointer), the choice will be unsigned chars.  Which access type predominates?

Chars are accessed an awful lot via pointers, since that's how all string
manipulation is done in C.  I would think that simple char variables and
offset references would be rather less common than just "*cp".
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry



More information about the Comp.lang.c mailing list