Re^2: Why nested comments not allowed?

Tue Feb 20 07:29:50 AEST 1990

>From article <4320 at daffy.cs.wisc.edu>, by schaut at cat9.cs.wisc.edu (Rick Schaut):
> I think you've missed the point.  In compilers for languages that do not
> allow nested comments the parser never see the comment at all.  The comments
> are eaten by the scanner (which is a much simpler part of the compiler than
> is a parser).  Essentially, any language that requires balancing characters
> (e.g. the language of balanced parens) cannot be represented using regular
> expressions, and regular expressions are the construct upon which scanners
> are based.  In short, a compiler for a language that doesn't allow nested
> comments is _much_ faster than a compiler for a language that allows them.

The last sentence doesn't follow from the rest of the paragraph.
Scanners may be *based* on regular expressions, but the popular
scanners (Lex, Flex, and friends) are not *restricted* to regular
expressions.  In fact, as people often have pointed out, parsing
comments with regular expressions can be dangerous with some scanners
because long comments will overflow fixed-sized buffers.  A common
work-around is to detect the beginning of a comment by a regular
expression and call a function (in C, perhaps) to eat the rest of the
comment.  This avoids the buffer-overflow problems and makes it
trivial to parse nested comments---just count the number of
<begin-comment> tokens and match them with <end-comment> tokens.
Nothing slow about that.
-- 
Mike Coffin				mike at arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2}!arizona!mike
Tucson, AZ  85721			(602)621-2858