Internationalisation (was: NULL as a string terminator)
Chris Preston
cbp at icc.com
Wed Aug 29 14:35:13 AEST 1990
In article <3617 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1990Aug24.064203.20942 at icc.com>, cbp at icc.com (Chris Preston) writes:
>> If you reaaaaaly want the text in the source section (incidentally, xscc on
>> System V [your original example] does invoke the C preprocessor
>
>No, xscc was *not* my example nor anyone else's in this thread before this.
No one else in the thread before this talked about X/Open and Ansi's
handling of multi-language portability on System V R4.
The standard offering on System V Release 4 for doing
"internationalization" _is_ MNLS. I have a fax of the price schedule
from UNIX research laboratories dated June 22 in front of me.
MNLS has xscc to produce message files (or catalogues)
from the application as part of the compilation process. Gencat is then
used to produce the message file in the applicable language as
applicable.
Your example was Sys V R4. This is simply part of the tools that anyone
doing internationalization work is likely to do it with. Particularly,
contractors for products on vendors that repackage System V R4 on their
own boxes will stay with these tools for the export systems, as will third
party vendors that do applications that are offered with the base system.
Rolling your own extractor is less applicable here when looking at System
V R4 since it has so many standards rolled into one that it is best to
stick with the tools offered. Otherwise, one is likely to find that
personalized extractors and resource catalogues that do might meet the same
format as produced for MNLS and conform with X/Open.
>I mentioned System V Release 4, to be sure, but I did not mention xscc.
IMHO this is like mentioning Unix software development and not mentioning
the C compiler and development system. There are certainly basic
compilers, ratfor and f77, but these would not be the assumed development
tools normally.
>How on earth is using xscc supposed to help me use the same message
>file for C, Pascal, Fortran, and Lisp?
Because it produces an external message file that can be translated into
multiple languages using gencat, can be modified at the customer site and
is going to be about as portable to Pascal as is the "awk extracted"
version that was previously offered.
>
>> so text substitution is absolutely not broken under MNLS
>
>Whoever said it was?
>
>> Another method would be to do something like the following (assuming that
>> you are invoking the C preprocessor):
This example deleted.
>Again, this technique means that you need the sources, and that to
>change the messages you need access to the sources and to recompile.
>That was an objection validly raised against the stripped-down message
>file technique I posted, and it applies with greater force to this.
There were two examples, one of which was to use labels and to have the
string literals in a single place in functions like geterrmsg(),
getusermsg() and so forth. This was somehow deleted, but will work
without difficulty when extracting the messages from the start, at under
the proposed technique (awk extractor) would produce:
#define USER_MSG0 0
#define USER_MSG1 1
#if DOS
#define CONTINU_MSG 0
#elif UNIX
#define CONTINU_MSG 1
etc.
#define CONTINUE getusermsg(CONTINU_MSG)
getusermsg(UserMsgNo)
char * UserMsgNo;
{
switch (UserMsgNo){
case USER_MSG0:
return ExternMsgGet(soveval);
where before the awk extractor the return value was
return "type any key to continue";
and the application says
printf("%s\n",CONTINUE);
This is, of course, an example and is subject to ones own methods and
style.
>
My comments deleted.
>
>The point of a message file is that
> -- the "central place" is OUTSIDE THE PROGRAM
> -- a message file can be got at by someone with no (other) access to sources
> (this is a *big* deal for developers!)
> -- *one* version of the object file can be shared by people using
> *different* message files.
The point is that an intelligent use of labels can allow
-- the "central place" is OUTSIDE THE PROGRAM
-- a message file can be got at by someone with no (other) access to
sources (which is certainly a big deal with our products)
-- *one* version of the object file can be shared by people using
*different* message files.
-- code can hide machine dependencies for string literals and
constants in label form and "internationalization" is _not_ broken.
>
>> >As for efficiency, the point is that we are talking about a scheme for
>> >generating messages for display to humans. The cost of fishing the text
>> >out of a file is (or was every time I measured it) considerably less than
>> >the cost of displaying it on the terminal.
>>
My comments deleted.
>
>That's non-sequitUr, and this "rebuttal" is badly flawed.
This is not a debate.
As to using external message files, if the application is in
a native language it would not hurt to compile a straight version
without messages being extracted and use a standard tool
to do the extraction for a multi-language version, for example.
In such a case, the multi-language version will either spend no
additional time, or some additional time "fishing" the text out of a
file. Yes, if some form of memory mapping is used so that the the
messages are mapped into the heap, then great, there is no difference in
the speed for the multi-language version. That is the best case
scenario. The worst case scenario is that it will take additional time
and slow the application down. That is the worst case situation.
The multi-language version will not be any faster than the native
version and at best slower. That the additional delay is less than some
other delay, like display time is not significant. The display of
messages in English or Kanji, will occur at some point in most
applications, period. Whether there is an additional overhead or
no-overhead from fetching the message to display is not a valid
comparison to the guaranteed display time of the message.
>What I claimed was
> (cost of fetching message) << (cost of displaying message)
The actual point appears to be that the additional delay will only be
some or none irrespective what the choice for comparison is.
This is not an argument for or against having message files but rather
an additional performance consideration.
>Someone with measurements to disprove this can refute me (for a particular
various explanation about virtual mapping deleted.
my comments about additional concerns in programming practises for
internationalization deleted.
>
>That is why, for example, ANSI C has
> wchar_t
> wcstombs()
> mbstowcs()
> mblen()
>and so on, and why it is set up to allow multi-byte characters in
>constants.
And why much code must be rewritten using the Ansi standard. By the same
token, a great deal of development is done with non-Ansi compliant
compilers because that is what is native to the system and that is what
the prime contractor requires be used in the application development
(witness open desktop). It is, therefore, not just a matter of
"well lets just use gcc or buy some Ansi compliant compiler with the
appropriate libraries." It is like talking about using posix compliant
system calls only on an earlier release of System V that is not posix
compliant.
>
>> >There's four negative impacts of the #ifdef approach, just for starters.
>
>> Given the above examples, do you still feel this to be the case?
>
>Of course. Those four negative impacts still stand.
The example that you deleted would negate all for of thee negative
impacts.
>
My comments deleted
>
>Again, who said _that_? Not me! That there are *better* ways to do some
>things than using the C preprocessor, who can challenge that? The only
>question is, _which_ tasks? Given that I said I would like to share
>message files between several programming languages, using a facility
>peculiar to one of them (there is no guarantee that /usr/lib/cpp will be
To anticpate a version of C that does not perform a preprocessing
stage is an interesting prospect. To anticipate using other languages as
a programming consideration when coding in C is probably the beyond the
bounds of this newsgroup and not likely to be the concern of those whose
applications are done completely in C. It is oftentimes chosen (like
here) _because_ of its intermachine portability. Your examples and the
drift of discussion indicates that portability to Pascal, Fortran and
Lisp is worthwhile. Perhaps this is true in some cases, but it is just
as applicable to rely on labeling for literals and constants in order to
port the same C code, which is what we do here.
>available nor anything like it) would be rather silly, wouldn't it?
For something other than C, yes. Given the newsgroup, no.
>
Various deleted about Ansi and X/Open.
>
>
Comment about TCL deleted.
In summary, using labels for string literals is a good thing, and can be
done without "breaking internationalization" as was previously
suggested.
cbp
------
Recent conversation between Kurt Waldheim and Saddam Hussein:
"Saddam, I *knew* Hitler, and believe me, you're no Adolf Hitler."
More information about the Comp.lang.c
mailing list