Internationalization

Mark HAHN markha at microsoft.UUCP
Thu Mar 1 14:46:06 AEST 1990


there is no good answer yet.  I offer a few things to be aware of:
 
CHARACTER CODING
don't make assumptions about how "character" data is coded.
once upon a time, all characters were ASCII, that is, 0 to 127.
nowadays, the minimum is 8-bit characters, with much movement
in the direction of multi-byte characters or 16-bit chars.
as far as I can see, there are no difinitive or complete standards.
(for instance: what is the format of a locale string,  are wchar_t's 
supposed to be portable, are MB chars or wchar_t's valid in file names, etc.)

PRESENTATION PREFERENCES
you also should avoid assumptions about language and country-oriented
behavior like sort order, up/down casing, date/time/number/currency formats.
to be truly virtuous, you can't even assume text directionality!

ISOLATE MESSAGES FROM CODE
keep any strings in some separate file - if nothing else,
just have them in a big array somewhere, and refer to them
using symbolic indices.  various OSs have better (?) or more elaborate
support than this - X/Open message catalogs, Mac/Windows/PM resources,
OS/2 message files.

ON THE HORIZON
there are a number of promising directions.  UniCode is one of them:
a 16-bit character set that is able to represent everything uniformly.
I don't know of any promising ideas for managing messages, though.
Internationalization is not glamorous, hence the various Unix groups
estimate 1992 for shipping international support.  Just remember that someone,
probably not the original author, will be trying to translate those messages.

Maybe the real benefit of iconic or direct-manipulation user interfaces 
is the smaller number of messages...

regards,
Mark Hahn
-- 
Mark Hahn	microsof!markha at uunet.uu.net	uunet!microsof!markha
I don't speak for Microsoft.



More information about the Comp.std.c mailing list