Need COBOL Grammer

Esmond Pitt ejp at bohra.cpg.oz
Thu Dec 14 12:28:05 AEST 1989


In article <706 at dsachg1.UUCP> zhb2165 at dsachg1.UUCP (Ned D Hanks) writes:
>I am in the need of grammar for COBOL. Any format would be ok but
>lex and yacc would be best.

This question comes around every year or so. You can do a 'yacc'
grammar for Cobol(-85 I assume), but it's very difficult.

1. Cobol-85 is neither (i) LR(k) for any k, (ii) context-free, nor even
(iii) regular.  Anybody contemplating 'lex'/'yacc' for COBOL who
doesn't know what the above means is advised to forget all about it
straight away and do it in recursive descent in C with a hand-written
scanner. You need a good appreciation of these 3 issues to understand
how to get around them with tools such as 'lex' and 'yacc' which rely
on these properties.  Cobol-74 is slightly better from this point of
view, but not all that much.

2. Cobol-85 has 400+ reserved words, and this alone will bust most
yacc's unless they are greatly enlarged, which means you need the
source or a co-operative vendor.

3. The grammar requires semantic feedback at various points, which
means you have to built quite a lot of the compiler even if that's not
what you're going to use it for.

4. Lexical problems: Keywords, identifiers, and literals can be
continued across line boundaries. There are two distinct rules for
continued tokens. There are three distinct context-dependent scanning
modes (normal, PICTURE string, comment-entry), and the last of these is
not very well specified from the implementor's point of view.  There
are two distinct definitions of a token, depending on whether you are
doing Source Text Manipulation or compiling proper. And so on and so
forth.

On the other hand, I've done both a 'yacc' grammar and a 'lex' scanner
for Cobol-85. In their present state they're undoubtedly
incomprehensible to anybody but me. This work may turn into a product
one day so I'm not about to release it to the world.

I also had to speed up my yacc, as originally it took about 15 minutes
to produce a parser (on a Pyramid, that is). Bison was better (a couple
of minutes).


-- 
Esmond Pitt, Computer Power Group
ejp at bohra.cpg.oz



More information about the Comp.lang.c mailing list