Character Sets (was Re: trigraphs)

Thu May 4 01:49:34 AEST 1989

In article <373 at cybaswan.UUCP> iiit-sh at cybaswan.UUCP (Steve Hosgood) writes:
>Several people have been talking about Trigraphs recently...
>Now IMHO, we're seeing here the consequences of restricting the world's
>computer users to a 7-bit coding system originally designed just for American
>English. Surely it would be better for ANSI to scrap formally the concept of
>7-bit coding and move to better things? ...

You've got the problem exactly backwards.  ANSI C, and most other language
projects now current, are perfectly happy to assume 8-bit character sets.
The problem is that the *complainers* have 7-bit equipment that uses a
different 7-bit standard, and *they* don't want to be forced to upgrade.
They want officially-blessed, easy-to-read ways to write ANSI C using
their own old equipment.  (What next, an ANSI C encoding for the IBM Model
26 keypunch?!?)

There is in fact a standard set of 8-bit character sets, the ISO Latin sets,
that solve this problem completely -- each one has full ASCII as a subset.
ISO Latin 1 covers essentially all the Western European languages (there
is some small problem with Welsh that slipped through by accident), in
particular.  There are standard shift sequences to reach other alphabets.
(Although shifts are an enormous pain in string manipulation, which is
why ANSI C recognizes the notion of "wide character" to deal with such
things internally as unshifted codes.)  Someday the terminals etc. will
speak ISO Latin, and that will solve this set of problems.  (Then we'll
have the oriental languages to deal with... the existing code-extension
hooks can cope in theory, but in practice it's cumbersome.)
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry at zoo.toronto.edu