Character Sets

Steve Hosgood iiit-sh at cybaswan.UUCP
Tue May 23 23:20:50 AEST 1989


In article <10284 at smoke.BRL.MIL> gwyn at brl.arpa (Doug Gwyn) writes:
>However, X3J11 did mandate that the character values for '0'..'9' have
>adjacent values in ascending numerical order.  That is clearly a code
>set requirement, which I argued against.  The need for some way to
>map digit characters to numbers and vice versa does exist, but other
>means to meet this need could have been specified.

Seems like a job for <ctype.h> to me. Interesting though, I had never
considered the possibility of non-contiguous numbers and alphabetics rearing
its head now that EBCDIC is dead (slight :-)).

>>The 'UCASE' hack to allow UN*X to work on silly old terminals was put
>>into the TTY handler. So I believe should this trigraph thingy.
>Not every system has such facilities, but I agree with your general
>sentiment.  In fact I expect that some of the more enlightened
>implementors will take exactly this tack to deal with practical use
>of so-called "European character sets".

But if this trigraph thing gets into the standard, then *all* conforming
compilers will *have* to have the code in their lexical analysers. As you
say, enlightened (:-)) implementors will probably deal with the problem in
the handler, but the compiler carries the baggage around for evermore *as well*.

>The new ISO code set standards should also help.

I certainly hope so. Presumably the C standard allows for 8-bit character sets?
Also, what about such things as allowable characters in identifiers and such
like? Just yesterday, I was writing a program where I would have liked to have
used Greek characters as identifiers. Is that sort of thing permissable?
Would 'toupper' return upper-case Epsilon if given lower-case epsilon as an
argument?

It's a tricky can of worms, and it gets worse the closer you look at it.
Steve



More information about the Comp.std.c mailing list