preprocessing: character- or token-based?

Fri Aug 26 14:44:54 AEST 1988

: Without quoting exact passages, the Rationale seems to imply that the
: Standard is written in such a way as to allow the preprocessor to be
: either character based or token based.  It seems to me that either that
: implication is false, or else the Standard leaves things like:
:
: #define  aplus  +a
:      +aplus;       /* one operator, or two?   */
:
: ambiguous in meaning.  Can anyone point me (quoting exact passages :-)
: to a place in the Standard which precisely defines what happens in this
: case?  It seems to me that the problem is that the Standard nowhere defines
: the meaning of the part of translation phase 7 called "Preprocessing
: tokens are converted into tokens."

Two operators.  The sentence is unambiguous.  Especially since
the statement was preceded by "White-space characters separating
tokens are no longer significant." This would imply that the
tokens are concatenated *without* white space being available to
separate tokens, a clearly ridiculous conclusion.

We can add to this, from 3.8.3 in the Rationale: "Preprocessing
is specified in such a way that it can be implemented as a
separate (text-to-text) pre-pass or as a (token-oriented) portion
of the compiler itself." The first part of this sentence is
asserting that the preprocessor can be designed as a program that
reads text and writes text, it is not implying anything about the
internal operation of the preprocessor.  The second part implies
that the preprocessor operates on tokens.

---
Bill
novavax!proxftl!bill