Why nested comments not allowed?

Wed Feb 21 15:00:15 AEST 1990

In article <236100027 at prism>, ly at prism.TMC.COM writes:
> 	I'm just curious to know why nested comments are not allowed in many 
> 	languages.

To start with, some languages _do_ allow them.  For example,
Common Lisp has   #|...comment...|#  which nests.

There is the obvious point that nested structures of any kind are
not definable with regular expressions (and LEX is not the _only_
r.e. tool around, you know).

But the *real* reason is that they simply don't work.  Imagine a
Pascal dialect which admits nested comments.  Comments are used to include
natural-language text in the program, so we have to allow things like
	{This is a `quotation'}
But program text may legitimately contain
	fred := '}';
and when we comment it out by wrapping {..} around it we get
	{ fred := '}'; }
In order to handle this code fragment, we mustn't take the "}" following
the "'" as a closing bracket, but in order to handle the text fragement
we *must* take the "}" following the "'" as a closing bracket.

We could easily arrange for comments to be viewed as unstructured
except for comment brackets being significant.  That's what Common Lisp
does, and it's what's usually done when nested comments are provided.
But that means that wrapping a *valid* statement in comment brackets
may produce *invalid* text.
We could easily arrange for comments to be viewed as sequences of
programming language tokens.  Pop-2 did that.  Commenting out code
fragments would work well done that way, but you'd have trouble with
text.  In fact Pop-2 programmers used to have to write
	comment `This is text written as a string so that it can'
		`be included in a comment without being parsed as'
		`Pop tokens';
Not good.

We conclude that there are two *different* things:
    (a) marking a sequence of tokens so that the processor will behave
	as though those tokens were not present
    (b) including text which does not follow the lexical rules of the
	programming language in question

In C, we use #if/#endif (which nest!) for (a) and /**/ for (b).

Another clue that (a) and (b) are different is that there is usually
some _reason_ why the sequence of tokens in (a) is not to be included,
but no reason is needed for (b) because non-token text could _never_
have been part of the program proper.  This also suggests that it might
be a good idea to explicitly label type (a) "comments" with the reason.
In C, for example, we would have
	#if	DEBUGGING
		....
	#endif