draft ANSI standard: trigraphs rear their ugly heads again

John Gilmore gnu at hoptoad.uucp
Tue Dec 2 16:27:54 AEST 1986


[This is posted to comp.lang.c because mod.std.c seems to be dead.  Love
those mod groups!]

The committee did not want to tie C to ASCII.  Fair enough.  What they
did was require that all the relevant characters be in the character
set (section 2.2.1), but not say anything about their character encoding.
In fact, you could compile source code in ASCII to run on a machine
that uses EBCDIC in the runtime environment.  This is great.

The problem is that they went ahead to try to define a way to represent
all the relevant characters in all the ISO code sets used in Europe.
Since various countries reuse #, [, {, }, ], \, |, ~, and ^ as letters
and such, they have defined three-character sequences that can be used
to represent these characters.

Now, these are major characters in the language.  The preprocessor
prefix #.  The block structuring construct { }.  The array subscripter
[ ].  And the ultimate escape character \, as well as a bunch of
logical ops.

My question is this.  Is a C program that is written in plain old ASCII,
using the above characters, portable?  Is it a "strictly conforming program"?
Is every ANSI standard C compiler in the world required to read in such
a program and translate it properly?

Next question.  Is a C program that uses local letters outside character
strings (e.g. as letters in French or Swedish identifiers) portable?
Is it a "strictly conforming program"?  Are there ANY C compilers anywhere
in the world which will read in such a program and translate it properly?

My preliminary answers are:  C programs that use ASCII characters had
damn well better be strictly conforming, or every C program in the world
is broken.  C compilers on European machines could support the national
letters in identifiers and such, but any program that used this feature
would not be portable.

Since a European C compiler which supported using the local characters
AS LETTERS would encourage unportable code, it would be better to make
European C compilers which did not support using the local characters
as letters.  This is tough, but are we trying to be nice or are we
trying to encourage portability?  Since the specific intent of the
standard is to prompte portability, features in the standard which
encourage the generation of nonportable code should be questioned.
Newly introduced features discouraging portability should be removed.

Now.  If European C compilers do not support using the local characters
as letters, and don't support using them as ASCII punctuation, everyone
in Europe will be forced to write their code using trigraphs.

Of course, any code written in North America or the UK will use ASCII
characters, so the Europeans will have to write a program to translate
the imported {, }, etc into trigraphs.

I think that a better solution is for the European compilers to support
these character codes to mean what they mean in ASCII.  Now imported
sources can be compiled directly.  Also, Europeans would have the choice of
editing the ASCII sources rather than using trigraphs.  The programs
will look funny on local terminals, but I don't see how it can be
harder to read a program filled with local letters as punctuation, than
it can be to read a program that looks like:

??=include <stdio.h>

main(argc, argv)
	int argc; char **argv;
??<
	char buf??(??) = "Hello, world!??/r??/n";

	if (feof(stdin) ??!??! argc != 0) ??<
		printf(buf);
	??>
??>

Since the trigraphs are even uglier than the alternative, and since
European compilers will not be able to use those character codes for
anything else, there is no need for introducing the trigraphs.  "The
X3J11 charter clearly mandates the committee to *codify common existing
practice*" (emphasis theirs -- Rationale, pg. 1).  The committee's 
justification for ignoring common practice here is too weak.  The
trigraphs should be removed.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore at lll-crg.arpa
    "I can't think of a better way for the War Dept to spend money than to
  subsidize the education of teenage system hackers by creating the Arpanet."



More information about the Comp.lang.c mailing list