error recovery

Henry Spencer henry at utzoo.uucp
Tue May 2 05:52:27 AEST 1989


In article <4595 at goofy.megatest.UUCP> djones at megatest.UUCP (Dave Jones) writes:
>> ... Have the parser tell the scanner what
>> kind of tokens it wants at each point, rather than just asking for "the 
>> next token", and do the error recovery in the scanner.  The parser
>> always sees a syntactically correct program, and never has to get into
>> the messy business of popping an activation stack.
>
>This corresponds to the action which an LR parser
>takes as it gobbles tokens from the input until a token can legally be
>shifted. In the scheme you describe, is there anything equivalent to poping
>the LR-state stack?

Nothing direct; the parser always sees syntactically correct input and
never has to pop.  However, it's wise for the scanner to include some
sort of equivalent.  If there is an agreed-on notion of "line terminator"
tokens, e.g. semicolons, the scanner can generate false input or discard
real input to keep line terminators in sync with the parser's requests
for same.  Combined with a simple low-level heuristic or two, this works
surprisingly well.

In principle, doing this at multiple levels (which is trickier) is better,
but in practice it doesn't seem worthwhile.

>It would seem to me, that to accomplish this same kind of "snipping",
>in an LL parser, some kind of longjump, or error induced short circuiting
>would be necessary, in order to abort the productions which should not
>be completed.  It seems wrong to force the completion of such productions,
>willy nilly, with tokens from the input stream, and in doing so, perhaps
>"stealing" tokens from other productions which might otherwise be completed
>successfully...

It may seem "wrong", but on the other hand, it ensures that later phases
see a complete and consistent picture, and avoids a cascade of code
everywhere in the compiler to deal with the consequences of simple syntax
errors.

>If I understand the proposed method correctly, I can even
>think of situations in which rather simple mistakes would cause productions
>to be done using tokens which the author obviously did not intend to go
>together.

It can happen.  In practice it's not a serious problem.  That's actually
a good summation of the method as a whole:  it may sound inelegant but
it works well in reality.  It's simple to implement and it lets the rest
of the compiler ignore the possibility of syntax errors.  The rumors of
difficulty are greatly exaggerated.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry at zoo.toronto.edu



More information about the Comp.lang.c mailing list