trigraphs in X3J11

Dick Dunn rcd at ico.ISC.COM
Thu May 26 02:23:08 AEST 1988


Thanks to Doug Gwyn for some answers on trigraphs.  Unfortunately, the more
I learn, the less I like them...but that's not Doug's fault.  >=me, >>=Doug

> >1.  Replacement within strings:  This is a change to the existing language.
> >    It breaks existing programs.  ...
> >    Point:  The sequence "??" is not at all rare.
> Trigraphs ARE relatively rare in existing code.  Yours is the first
> example I've seen, in fact.  Most applications think ? should be used
> as a question mark in messages, perhaps ?? at the end of a few message
> strings or in a chess program.

Wait.  I said that ?? (the trigraph introducer, if you will) is not at all
rare, and this is easy to confirm.  Occurrences of ?? are important because
they represent situations where the next character could cause trouble.

Go look at source code!  If you're on a UNIX system, find some source and:
	find . -name '*.[ch]' -exec grep '??' '{}' ';'
I suggest that you look for all ?? instead of just trigraphs so that you
can get an appreciation of where ?? appears.

When I first found trigraphs, I said "WTF??!" and immediately looked at my
own source code.  I found one conflict.  So I went to a UNIX source tree
and found several occurrences in Sys V code.  More poking around turned up
scattered others--some netnews source, some networking stuff.  There
aren't a lot of them, but they *do* exist.

I would have expected the committee to do as I did--search large piles of
source code to look for conflicts.  It only took me a little while one
evening.  Some repeats--??! as an expletive; (???) for a questionable item.

The following is NOT meant as a flame against Doug (who has stuck his neck
out to explain some of what has gone on), but I think the committee reneged
on its responsibilities in putting trigraphs in.  From the X3J11 rationale:

| The X3J11 charter clearly mandates the Comittee to _codify_existing_
| _practice_.  (emphasis present; "_" is italics)
|  ...
| Existing code is important.
|  ...
| Avoid "quiet changes."

Trigraphs are not existing practice; apparently they have not even been
really tried out!  They break existing code in a "quiet change" fashion.
There are real examples of code currently in use which will be "broken"
if recompiled by a compiler conforming to this part of the draft standard.

> >    What I don't understand is why it was decided to
> >    introduce a brand-new (I assume) mechanism which breaks existing code.
> Because nobody, including you, has proposed anything that the Committee
> agreed was better,...

I intentionally avoided any sort of counterproposal in the first posting
because I wanted to focus on what the committee had done and why; I didn't
want to start with a debate over anything I would propose.

I have a philosophical view that this problem would be better off with
no solution than with a clumsy solution that breaks existing code.  (I
don't agree that "a bad solution is better than none at all.")  There are
other areas where X3J11 said "there's no prior art" and/or deferred work on
a problem to extension work.

Trigraphs in strings are the important issue; trigraph symbols in code are
ugly but don't break anything.  So, just for the sake of argument I'll toss
out some ideas for strings:  There is already one form for an alternate
interpretation of the mapping of a literal character or string into its
memory representation, namely L"stuff" for wide chars and strings.  Why not
use the same model--say, precede the string with R for restricted or T for
trigraph; thus R"stuff??/n" would mean R"stuff\n".  Even if you think
L"stuff" is a mistake, this would only be a second occurrence of the same
class of mistake.  (Karl Heuer noted that L"stuff" is a quiet change too,
but it's highly unlikely to hit; I've found no occurrences.)

As I said, that was JUST a proposal for the sake of argument.  You might
equally well construct names for the problem characters and build them
into a header file; then construct strings by the compile-time concate-
nation business.  There are other ways.  YES, they're ugly, BUT they don't
have to break existing code, while the draft standard method is ugly AND
breaks code.

What about an ISO 8859 character set?  Wouldn't that cover a lot of the
problem area?

>...and many C users (for example, Europeans) have a
> perceived need that the parochial American outlook does not meet.

I understand their need.  I agree that it's "parochial" to ignore the
problem, but I don't think it's parochial to say "we don't have a good
solution yet, so let's not cast a bad one in concrete."  =>What do Europeans
do about C now?<=  Is there NO prior art?  If not, it's certainly not ready
to be standardized!

> >Has the trigraph mechanism been tried out, in real practice, anywhere
> >prior to the introduction in X3J11?
> This specific mechanism is an invention of X3J11, so far as I can
> determine.  However, use of multi-byte sequences to encode things
> that cannot be represented by a single byte is extremely common
> practice.

I know that multi-byte sequences are common--I worked with 370ish Pascal
quite a while back, and we had to use digraphs for about six characters.
These digraphs became part of the Pascal standard, BUT there's a big
difference: the digraphs were established practice long before the
standard was done.  They were in use, known to be practical (if ugly),
and didn't break anything on machines that didn't need them.

It is also clear that you don't get very far trying to invent believable
digraphs for C, so you need trigraphs if you go that route.  The objection
is that they haven't been tried out.  You're standardizing something you
haven't really used in practice, and since C is not Ada (oops; sorry:-),
that's just not wise.

> Note, by the way, that I oppose trigraphs, but I can provide a definite
> explanation of how the European needs can be met without them...

Then I wish folks had pushed against them harder.  (Maybe you did, Doug; I
don't know.)
-- 
Dick Dunn      UUCP: {ncar,cbosgd,nbires}!ico!rcd       (303)449-2870
   ...If you get confused just listen to the music play...



More information about the Comp.lang.c mailing list