Summary: 'C', is it's grammar context sensitive ?

Cedric Ramsey ramsey at NCoast.ORG
Fri Aug 31 08:34:40 AEST 1990


/*
 ** I am posting this to comp.lang.c again, I don't think the mailer did the
 ** first time
 */

>>	Hello again ! This question is directed towards the 'C' and
>>compiler gurus out there. I was studying the grammar for the 'C'
>>language and I couldn't help but notice that for declarations the
>>grammar is context sensitive.
>[...]
>>Since the 'typedef-name' is an identifier is impossible to determine that
>>it is a type defintion without looking at the context. I guess that one could
>>do a pre-scan of the source code and build typedef trees but I thought that
>>'C' was context free grammar.
>
>You've hit on a common problem in parsing C -- the typedef vs.
>identifier issue.
>
>Indeed, C is `context sensitive' in that a typedef name has a very
>different syntactic value from that of an ordinary identifier.
>Nevertheless, parser generators that handle only simpler languages do
>very well with C.  How do they resolve this contradiction?  They
>discard a certain sort of purity and introduce an informal feedback.
>
>In processing the program, a compiler will be maintaining a symbol
>table, and keeping typedef names in it.  It is a simple matter to make
>the lexer inspect the symbol table when processing an identifier, and
>to return a different token type for an identifier that has appeared
>in a typedef.  In this way, the grammar has different lexeme types for
>identifiers and typedefs, and the context sensitivity goes away.
>
>Every C compiler that I've studied has this feedback mechanism between
>the semantic phase and the lexer.  It's a familiar solution to the
>problem.
>
>Kevin, KE9TV
>until 8/31: kenny at cs.uiuc.edu
>after 9/17: ke9tv at nrtc.northrop.com
>
>
When I wrote the last message I was had the notion that typedef names
could be used before they are declared, in ansi 'C'. I guess that was 
a false assumption. Because the following, I think, is illegal:

typedef struct vehical {
  make_t make;
  style_t style;
  owner_t owner;
} vehical_t;

typedef ... owner_t;
typedef ... make_t;
typedef ... style_t;

The compiler must know ahead of time that make_t, style_t and owner_t 
are type names. That way, the scanner, it could lookup the name in the 
symbol table and see that it is a typedef name and return TYPEDEF_NAME.
I don't have the ansi draft; only K&R2. K&R2 doesn't mention, at least
I don't recall reading it, that typedef names must occur before they are
used so these points a purely speculative. Also K&R2 doesn't specify if
the following is legal:

typedef unsigned char uchar_t;
uchar_t uchar_t;

I would say that this is illegal, even though uchar_t is not a keyword.
Why, because ... I don't know, maybe because it would be harder to parse.

In lue of the above speculations, the compiler wouldn't have to make 
multiple passes to collect typedef names. I hope this is true or 
the grammar at back of K&R2 would not be acceptable to yacc, as claimed,
without a rewrite, I could be wrong though.  

If thus far I am correct, I would go on further to say that, identifiers
must be delared before they are used wether that be as declarators
or as typedef names.

What is the verdit can I safely assume this stuff or should send off
for the ansi 'C' standard, at my first financial opportunity.

If you guys agree that 'C' is context sensitive then what languages
truely are context-free, if any. 





More information about the Comp.lang.c mailing list