mod.std.c Digest V2 #12

Tue Jan 29 11:55:23 AEST 1985

mod.std.c Digest            Mon, 28 Jan 85       Volume 2 : Issue  12 

Today's Topics:
                    What is the current standard?
             (notes) Standard C Digest - V2 #6           
                         #endif token-string (2)
----------------------------------------------------------------------

Date: Thu, 10 Jan 85 20:02:24 pst
From: cbosgd!ucbvax!ucsfcgl!arnold (Ken Arnold)
Subject: What is the current standard?
To: @ucbvax.UCB-VAX:cbosgd!std-c

In article Henry Spencer writes
>To quote the K&R C reference manual (henceforth "CRM"), section 12.1
>(emphasis added):
>
>	A [#define] causes the preprocessor to replace subsequent
>	instances of the identifier with the given string of tokens...
>	Each occurrence of a [macro parameter] is replaced by the
>	corresponding token string from the call...  *Text inside a
>	string or a character constant is not subject to replacement*.
>
>In other words, replacement inside strings -- be it for macros or
>macro parameters -- is a non-standard extension.  It's a "feature" of
>the Reiser C preprocessor, which is omnipresent in Unix C compilers
>but not in others.  The closest thing we have to an implementation-
>independent standard for C is the CRM, which explicitly outlaws replacement
>inside strings.
>
>I agree that this will break a number of things, including 4.2BSD.  How
>sad.  Those programs, including 4.2BSD, were implementation-dependent
>to begin with, and the authors have no right to cry about it.  It should
>be clear from this that I disagree with the committee's expressed intent
>to add such a capability later.  The current draft standard's neat new
>string-concatenation convention (adjacent string literals -- note this
>is literals only -- are concatenated at compile time) eliminates the
>need for in-string replacement as a way to build filenames out of #defined
>pieces, which to my mind was the only real need for in-string replacement.
>

Well, we all remember that K&R is not a standard, but it is an attempt
to describe how the language works.  At best it is *de-facto*
standard.  However, since this feature is "omnipresent in UNIX C
compilers" (it is also true in DECUS C, and probably in others), that
sort of seems like a de-facto standard, too.  So, which de-facto
standard are you going to follow?

If all UNIX C compilers use it, and many other C compilers use it, too,
it seems to me that we should be centering the standard around the
language *as used*, not according to K&R.  Note that, as it stands, the
existence of parameter replacement inside strings cannot be a
non-standard extension since there is no standard and it is extremely
common.  You might, at worst, call it a standard extension.  But the
ANSI standard ought to encompass normal usage, and this is part of
normal usage.

Also, if the committee has been willing to bend over backwards to
accommodate people with 6 character loaders who don't want to work
around the problem, we surely can do something as reasonable as not
break C code written under UNIX.  I consider myself an informed C
person, but this is the first time it has come to my attention that
this is not in K&R.  How many other people do you think have used
this?  Shall we hew to a widely ignored statement in a non-standard, or
to broad, nearly universal, actual usage?

I must also disagree with the last statement about any replacement for
this feature.  There is no replacement in the standard.  If there is no
parameter substitution in strings, concatenation of strings doesn't
solve anything.  I have used replacement inside strings for many other
valid purposes, not including concatenating existing strings.  The
"assert" macro, for example, can use this feature to print out which
assertion it was that botched.

	# define	assert(x)	{ \
		if (!x) \
			fprintf(stderr, \
				"assertion \"x\" botched, line %d, \"%s\"\n", \
				__LINE__, __FILE__); \
		}

>> 	"As indicated by the syntax, a token must not follow a #else or
>> 	#endif directive before the terminating new-line character.
>> 	However, comments may appear anywhere on any source line,
>> 	including on a preprocessor directive."
>> 
>> This breaks many existing programs, including rmail, deroff, diction,
>> efl, eqn, learn, lint, nroff, refer, struct, troff, uucp, and ingres.
>
>Interestingly enough, I find *no* occurrences of the trouble-causing
>syntax in rmail, deroff, eqn, learn, lint, nroff, refer, struct, troff,
>or uucp on my system.  A quick inspection of the System V sources (we
>have, but don't run, System V) also comes up empty.  So, this change
>breaks Berklix and only Berklix programs; everybody else has been
>following the CRM, which makes no provision for trailing tokens on
>#else and #endif.  This is a non-standard and implementation-dependent
>extension.
>
>I have no personal objections to this one, although I think the syntax
>ought to be specific (i.e., one identifier only) rather than wide-open
>(any random tokens).
>

Again, the committee is willing to protect people with archaic loaders
from working around it, but its okay to break a bunch of Berkeley 4BSD
code?  That's weird.  This seems also to be part of the infamous :->
Reiser C Preprocessor (since it works on our System V system, and in
DECUS C).  See above discussion.

Also, restricting it to one token doesn't handle a normal usage, which
is:

	# if    A || B
	...
	# endif A || B

Since this "addition" can break no existing code following the supposed
"standard", let's just do it so it doesn't break anything at all...

		Ken Arnold

------------------------------

Date: Sun, 13 Jan 85 16:55:56 est
From: cbosgd!ima.UUCP!haddock!ism780!ism780b!jim
Subject: (notes) Standard C Digest - V2 #6           
To: ima!cbosgd!std-c

Henry Spencer writes:

>[...] The current draft standard's neat new
>string-concatenation convention (adjacent string literals -- note this
>is literals only -- are concatenated at compile time) eliminates the
>need for in-string replacement as a way to build filenames out of #defined
>pieces, which to my mind was the only real need for in-string replacement.

He has confused the in-string replacement problem with the string
concatenation problem.   String concatenation in macros has previously
been achieved by  foo/**/bar or

#define IDENT(x)x
IDENT(foo)bar

and the standard's string concatenation does deal with this sufficiently.
However, it does nothing for in-string replacement.  Henry's point
that the feature was not standard by K&R is well taken (however snidely put),
although it is not reasonable to assert that K&R *outlaw* replacement within
strings in the macro definition, since it is fairly clear that they were
referring to strings in the running text.
However, it is important to recognize the pragmatic side of Ken's
code-breaking admonition.  In particular, it would be impossible without
in-string replacement to implement the UNIX assert library facility, which is
a macro defined as

#define assert(EX) if (EX) ; else _assert("EX", __FILE__, __LINE__)

Therefore, I think the committee simply does not have the freedom to
exclude the in-string replacement feature.

As for

>As various people (including me) have pointed out, modifying (say) the
>OS/360-aka-MVS linker is politically impossible, however desirable and
>technically-simple it may be.

that linker does not restrict one to 6 character externals, and so I wish
Henry would quit mentioning IBM (and DEC, which allows 32 character
externs in VMS) in that context.  People might question this limitation
a bit more if they realized that it was accounting for GCOS, not MVS.
Also, several proposals for accounting for such limited environments
without modifying the host linker have been proposed that Henry has not
addressed (mostly he has indulged in condescending attacks).
The major criticism of these methods has been their inconvenience,
especially for debugging.  However, such inconvenience *on these specific
systems* must be weighed against the costs to *all* C developers of a
six-character limit.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

------------------------------

Date: Sat, 12 Jan 85 11:19:48 est
From: cbosgd!pegasus.UUCP!hansen
Subject: Standard C Digest - V2 #6
To: cbosgd!std-c, ihnp4!utzoo!henry

Re: #endif token-string

< Henry Spencer @ U of Toronto Zoology
< Interestingly enough, I find *no* occurrences of the trouble-causing
< syntax in rmail, deroff, eqn, learn, lint, nroff, refer, struct, troff,
< or uucp on my system.  A quick inspection of the System V sources (we
< have, but don't run, System V) also comes up empty.  So, this change
< breaks Berklix and only Berklix programs; everybody else has been
< following the CRM, which makes no provision for trailing tokens on
< #else and #endif.  This is a non-standard and implementation-dependent
< extension.

I disagree. I too did a grep of the System Vr2 sources and found a number of
occurrences of this useful construct. In particular, the system header files
curses.h, term.h and sys/xtproto.h all used this construct. So it isn't just
Berklix programs that get broken. (Note that the curses.h is AT&T's (Mark
Horton's) version of curses.h and NOT Berkley's version.)

I too use this construct in most of my programs and would have to make
considerable changes to get my code to compile under ANSI unless this
restriction were lifted.

Besides, what does it hurt to lift the restriction?

					Tony Hansen
					pegasus!hansen

------------------------------

Date: 12 Jan 85 23:59:14 CST (Sat)
From: cbosgd!ihnp4!utzoo!henry
Subject: #endif token-string
To: ihnp4!pegasus!hansen

> I disagree. I too did a grep of the System Vr2 sources and found a number of
> occurrences of this useful construct. In particular, the system header files
> curses.h, term.h and sys/xtproto.h all used this construct.

My grep was on SysV, not SysV.2, since we don't have SysV.2 yet.

I also observe that the examples you cite are in an area where the
Berkeley influence on SysV.2 has been strongest.

> I too use this construct in most of my programs and would have to make
> considerable changes to get my code to compile under ANSI unless this
> restriction were lifted.
> 
> Besides, what does it hurt to lift the restriction?

My point was not that I'm opposed to the trailing-tokens notion -- I tend
to agree that it's a reasonable thing, although I would like to see some
restrictions (e.g. "one identifier only") for error catching -- but that
existing code which uses this construct is relying on an implementation-
dependent local extension.  While it would be nice if the ANSI standard
didn't break your code, I don't see that you have a legitimate cause for
complaint if it does.  People who want their code to be portable simply
have to pay attention to such issues, and avoid nonstandard extensions.
Even if it hurts.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

------------------------------
End of mod.std.c Digest - Mon, 28 Jan 85 20:33:21 EST
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> replies to cbosgd!std-c at BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) below.
-- 
Orlando Sotomayor-Diaz	/AT&T Bell Laboratories, Red Hill Road
			/Middletown, New Jersey, 07748 (HR 1B 316)
Tel: 201-949-9230	/UUCP: {ihnp4, houxm}!homxa!osd7