Is this correct action for the c compiler/preprocessor ??

L.Rosler lr at sftig.UUCP
Sat Nov 16 15:27:37 AEST 1985


> The question was whether the C preprocessor should substitute for an
> occurrence of a macro formal within a string within the body of the
> macro...
>
> > Being able to insert literal text in strings is very useful.
> 
> The fact that a feature is "useful" is not sufficient argument that it is
> correct.
> 
>                                                the definition that most of
> us use these days (K&R) says one thing:
> 	Text inside a string or a character constant is not subject to
> 	replacement.
> ...which is pretty explicit, but the compiler that a lot of us use
> substitutes inside strings.  I would like to have an authoritative
> definition and a correct compiler in accord with the definition.
> -- 
> Dick Dunn

Having been involved in many aspects of this fiasco, I'll give a
capsule history.

The original C preprocessor, designed and implemented by Mike
Lesk of AT&T Bell Labs for the PDP-11, did not substitute inside strings
(hence, the disclaimer in K&R).

The preprocessor distributed with VAX UN*X, hence picked up by
UCBerkeley, was implemented by John Reiser.  In addition to being
much faster than the original, it included many "features"
which were documented only in a file /usr/src/cmd/cpp/README,
dated August 25, 1978 (after the publication of K&R).
The file is still there, though updated -- look and see!

Among the features included without a great deal of review
were the "magic disappearing comment" used to glue tokens
together (despite K&R p. 179 "...comments...serve to separate tokens")
and the issue at hand of substituting within strings
(and character constants, for that matter, though no one seems
to pay much attention to this part of the issue).  The only
justification for the latter seems to be K&R p. 207:
"Each occurrence of an identifier mentioned in the formal
parameter list of the definition is replaced by the corresponding
token string from the call.

When I championed these features before the ANSI X3J11 C Committee
(most of whom had implemented a preprocessor according to the K&R
description, not the UN*X code), I first had to convince the
Committee that they were useful.  Several UN*X headers and
Alan Feuer's "The C Puzzle Book" helped here.

But I could not convince the Committee that the way the
features were implemented was acceptable, despite the tons of
code that incorporated them.  Reliance on undocumented
(what README file?!?) capabilities of a particular implementation
which contravened the clear sense of the de facto standard did
not fall under the purview of the Committee's goal of not
breaking existing "valid" code.

Several syntaxes were proposed, some of which were as simple
to implement as a new directive "#defines," meaning in THIS
macro, substitute for identifiers inside strings.
But they all foundered on the simple point that there ARE
no identifiers inside strings!  Strings and identifiers are
each "tokens," and writing a grammar to parse strings into
tokens was considered too outrageous.

(Note that "tokens" can turn up in surprising places:

#define PRINT(s) printf("%s", s)

produces remarkable results on UN*X compilers.)

So the Committee resorted to invention: # identifier
meaning "stringize" the argument token-string substituted for
the identifier; and token1 ## token2
meaning concatenate the two tokens nearest the ## after
all other substitutions.  The latter will be easy to substitute
mechanically for /**/, but the former will require some work.
Each of them has some advantages over the UN*X way,
not the least of which is that they don't do violence to
the rest of the language.

Even though I'm not happy with the idea of standards
committees inventing solutions that invalidate existing
solutions, I buy into this case.  As Henry Spencer warns,
don't use the UN*X features, and wait for the ANSI Standard
to provide better ways.

Sorry to be so long-winded, but this history HAD to be told.

Larry Rosler, AT&T Information Systems
(Editor, ANSI X3J11 C STandards Committee)
ihnp4!attunix!lr, 201-522-5086



More information about the Comp.lang.c mailing list