Internationalisation (was: NULL as a string terminator)

Richard A. O'Keefe ok at goanna.cs.rmit.oz.au
Sun Aug 26 21:38:23 AEST 1990


In article <1990Aug24.064203.20942 at icc.com>, cbp at icc.com (Chris Preston) writes:
> If you reaaaaaly want the text in the source section (incidentally, xscc on
> System V [your original example] does invoke the C preprocessor

No, xscc was *not* my example nor anyone else's in this thread before this.
I mentioned System V Release 4, to be sure, but I did not mention xscc.
How on earth is using xscc supposed to help me use the same message
file for C, Pascal, Fortran, and Lisp?

> so text substitution is absolutely not broken under MNLS

Whoever said it was?

> Another method would be to do something like the following (assuming that
> you are invoking the C preprocessor):

> #define DCOM_ERR	0
> #define DRVR_ERR	1  /* etc. etc. */

> char *ErrMsg[]={
> #if DOS
> 				"Run dcom.com",
> 				"Run driver.com",
> #elif UNIX
> 				"Datacomm not initialized, contact S/A",
> 				"Driver error, contact S/A",
> #else
> 				"Datacomm not running",
> 				"Driver not responding",
> #endif
> };

Again, this technique means that you need the sources, and that to
change the messages you need access to the sources and to recompile.
That was an objection validly raised against the stripped-down message
file technique I posted, and it applies with greater force to this.

> So, we have accomplished coding for purposes of internationalization,
> either way, we have separated string literals to a central place,
> and we have made the code more maintainable, since changes in messages for
> the environment can occure at one major juncture, and life is a cabaret.

The point of a message file is that
 -- the "central place" is OUTSIDE THE PROGRAM
 -- a message file can be got at by someone with no (other) access to sources
    (this is a *big* deal for developers!)
 -- *one* version of the object file can be shared by people using
    *different* message files.

> >As for efficiency, the point is that we are talking about a scheme for
> >generating messages for display to humans.  The cost of fishing the text
> >out of a file is (or was every time I measured it) considerably less than
> >the cost of displaying it on the terminal.
> 
>    Considering the program that pays no concern for "internationalization" 
>    does not have to source anything external to it's data segment at any 
>    time other than normal operations, to say that the additional overhead is 
>    equal to or less than existing overhead is a non-sequitor.  If you 
>    don't do it the cost ain't there.

That's non-sequitUr, and this "rebuttal" is badly flawed.
What I claimed was
	(cost of fetching message) << (cost of displaying message)
Someone with measurements to disprove this can refute me (for a particular
hardware/software combination) by displaying his figures.  Of course, what
is *really* interesting about this "rebuttal" is that in a virtual memory
environment it simply isn't true.  We're talking about messages here,
things which are displayed at relatively infrequent (we hope!) intervals.
Text, in short, which is paged OUT.  In a system which supports memory-
mapped files (VMS, Aegis, SunOS 4.x, AIX, ...) one could open the message
file as a memory-mapped file, and then the process of fetching a message
from the message file would cost no more than the process of fetching a
message from a pre-initialised character array, because the two would be
exactly the same process.

>   It has been pointed out here by several that are in the know on these
>   things, that arguing about string literals is moot in comparison to other
>   inherent difficulties presented by internationalization, and that the
>   necessary crusade to "C programming practices" is long a commin'.

That is why, for example, ANSI C has
	wchar_t
	wcstombs()
	mbstowcs()
	mblen()
and so on, and why it is set up to allow multi-byte characters in
constants.

> >There's four negative impacts of the #ifdef approach, just for starters.

>   Given the above examples, do you still feel this to be the case?

Of course.  Those four negative impacts still stand.

>   I do not think so.  I also believe that this shows that it is an unsafe
>   practice to say that something cannot be done within the framework of C
>   and the C preprocessor. 

Again, who said _that_?  Not me!  That there are *better* ways to do some
things than using the C preprocessor, who can challenge that?  The only
question is, _which_ tasks?  Given that I said I would like to share
message files between several programming languages, using a facility
peculiar to one of them (there is no guarantee that /usr/lib/cpp will be
available nor anything like it) would be rather silly, wouldn't it?

A serious problem concerned with "the need to make the texts we write for
the tools that count work with more than one tongue of men" (otherwise
known as "internationalisation" if you have no fear of words that have
more than one sound in them) is that C formats don't quite work.  One
common problem is that different languages put phrases in different orders.
The X/Open answer to that is to have an extra piece of information in
%format controls, saying which argument to use.  I presume that the ANSI C
committee considered that, and didn't include it because it basically needs
pointers and integers to be the same size.


The following suggestion is not altogether serious.  But bearing in mind
things like wanting to put phrases in different orders, and all sorts of
things one might like to let customers configure for themselves (without
having to give them *all* the sources), it might not be as crazy as it
sounds.
	How about using TCL (Tool Command Language) for "messages"?
TCL is a free "extension language" which somewhat resembles the Unix
shells, and is set up to be a *small* library that can be linked into C
code.  When one wants to report an event, one could format the arguments
of that event into strings, fetch a TCL command from a file, and execute
that TCL command.  It was intended to customise input to things like the
editor "mx", but there's no reason it couldn't be used to customise *output*.
As I say, not altogether serious.

-- 
The taxonomy of Pleistocene equids is in a state of confusion.



More information about the Comp.lang.c mailing list