Internationalisation (was: NULL as a string terminator)

Wed Aug 29 14:35:13 AEST 1990

In article <3617 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1990Aug24.064203.20942 at icc.com>, cbp at icc.com (Chris Preston) writes:
>> If you reaaaaaly want the text in the source section (incidentally, xscc on
>> System V [your original example] does invoke the C preprocessor
>
>No, xscc was *not* my example nor anyone else's in this thread before this.

  No one else in the thread before this talked about X/Open and Ansi's
  handling of multi-language portability on System V R4.

  The standard offering on System V Release 4 for doing
  "internationalization" _is_ MNLS.  I have a fax of the price schedule
  from UNIX research laboratories dated June 22 in front of me.

  MNLS has xscc to produce message files (or catalogues)
  from the application as part of the compilation process.  Gencat is then
  used to produce the message file in the applicable language as
  applicable.

  Your example was Sys V R4.  This is simply part of the tools that anyone
  doing internationalization work is likely to do it with.  Particularly,
  contractors for products on vendors that repackage System V R4 on their
  own boxes will stay with these tools for the export systems, as will third
  party vendors that do applications that are offered with the base system.

  Rolling your own extractor is less applicable here when looking at System
  V R4 since it has so many standards rolled into one that it is best to
  stick with the tools offered.  Otherwise, one is likely to find that
  personalized extractors and resource catalogues that do might meet the same
  format as produced for MNLS and conform with X/Open.  

>I mentioned System V Release 4, to be sure, but I did not mention xscc.

  IMHO this is like mentioning Unix software development and not mentioning
  the C compiler and development system.  There are certainly basic
  compilers, ratfor and f77, but these would not be the assumed development
  tools normally.

>How on earth is using xscc supposed to help me use the same message
>file for C, Pascal, Fortran, and Lisp?

  Because it produces an external message file that can be translated into
  multiple languages using gencat, can be modified at the customer site and
  is going to be about as portable to Pascal as is the "awk extracted"
  version that was previously offered.

>
>> so text substitution is absolutely not broken under MNLS
>
>Whoever said it was?
>
>> Another method would be to do something like the following (assuming that
>> you are invoking the C preprocessor):

  This example deleted.

>Again, this technique means that you need the sources, and that to
>change the messages you need access to the sources and to recompile.
>That was an objection validly raised against the stripped-down message
>file technique I posted, and it applies with greater force to this.

  There were two examples, one of which was to use labels and to have the
  string literals in a single place in functions like geterrmsg(),
  getusermsg() and so forth.  This was somehow deleted, but will work
  without difficulty when extracting the messages from the start, at under
  the proposed technique (awk extractor) would produce:

#define USER_MSG0 0
#define USER_MSG1 1

#if DOS
#define CONTINU_MSG 0
#elif UNIX
#define CONTINU_MSG 1

etc.

#define CONTINUE getusermsg(CONTINU_MSG)

  getusermsg(UserMsgNo)
  char * UserMsgNo;
  {

    switch (UserMsgNo){
	  case USER_MSG0:
		return ExternMsgGet(soveval);

where before the awk extractor the return value was

		 return "type any key to continue";

and the application says

    printf("%s\n",CONTINUE);

This is, of course, an example and is subject to ones own methods and
style.

>
   My comments deleted.

>
>The point of a message file is that
> -- the "central place" is OUTSIDE THE PROGRAM
> -- a message file can be got at by someone with no (other) access to sources
>    (this is a *big* deal for developers!)
> -- *one* version of the object file can be shared by people using
>    *different* message files.

  The point is that an intelligent use of labels can allow 
	-- the "central place" is OUTSIDE THE PROGRAM
	-- a message file can be got at by someone with no (other) access to
	sources (which is certainly a big deal with our products)
	-- *one* version of the object file can be shared by people using
	   *different* message files.
    -- code can hide machine dependencies for string literals and
	constants in label form and "internationalization" is _not_ broken.

>
>> >As for efficiency, the point is that we are talking about a scheme for
>> >generating messages for display to humans.  The cost of fishing the text
>> >out of a file is (or was every time I measured it) considerably less than
>> >the cost of displaying it on the terminal.
>> 

  My comments deleted.

>
>That's non-sequitUr, and this "rebuttal" is badly flawed.

  This is not a debate.

  As to using external message files, if the application is in
  a native language it would not hurt to compile a straight version  
  without messages being extracted and use a standard tool 
  to do the extraction for a multi-language version, for example.

  In such a case, the multi-language version will either spend no
  additional time, or some additional time "fishing" the text out of a
  file.  Yes, if some form of memory mapping is used so that the the
  messages are mapped into the heap, then great, there is no difference in
  the speed for the multi-language version.  That is the best case
  scenario.  The worst case scenario is that it will take additional time
  and slow the application down.  That is the worst case situation.  

  The multi-language version will not be any faster than the native 
  version and at best slower.  That the additional delay is less than some
  other delay, like display time is not significant.  The display of
  messages in English or Kanji, will occur at some point in most
  applications, period.  Whether there is an additional overhead or
  no-overhead from fetching the message to display is not a valid
  comparison to the guaranteed display time of the message.

>What I claimed was
>	(cost of fetching message) << (cost of displaying message)

  The actual point appears to be that the additional delay will only be
  some or none irrespective what the choice for comparison is.

  This is not an argument for or against having message files but rather
  an additional performance consideration.

>Someone with measurements to disprove this can refute me (for a particular

  various explanation about virtual mapping deleted.

  my comments about additional concerns in programming practises for
  internationalization deleted.

>
>That is why, for example, ANSI C has
>	wchar_t
>	wcstombs()
>	mbstowcs()
>	mblen()
>and so on, and why it is set up to allow multi-byte characters in
>constants.

  And why much code must be rewritten using the Ansi standard.  By the same
  token, a great deal of development is done with non-Ansi compliant
  compilers because that is what is native to the system and that is what
  the prime contractor requires be used in the application development
  (witness open desktop).  It is, therefore, not just a matter of
  "well lets just use gcc or buy some Ansi compliant compiler with the
  appropriate libraries."  It is like talking about using posix compliant
  system calls only on an earlier release of System V that is not posix
  compliant.  

>
>> >There's four negative impacts of the #ifdef approach, just for starters.
>
>>   Given the above examples, do you still feel this to be the case?
>
>Of course.  Those four negative impacts still stand.

  The example that you deleted would negate all for of thee negative
  impacts.

>

   My comments deleted
>
>Again, who said _that_?  Not me!  That there are *better* ways to do some
>things than using the C preprocessor, who can challenge that?  The only
>question is, _which_ tasks?  Given that I said I would like to share
>message files between several programming languages, using a facility
>peculiar to one of them (there is no guarantee that /usr/lib/cpp will be

  To anticpate a version of C that does not perform a preprocessing
  stage is an interesting prospect.  To anticipate using other languages as
  a programming consideration when coding in C is probably the beyond the
  bounds of this newsgroup and not likely to be the concern of those whose
  applications are done completely in C.  It is oftentimes chosen (like
  here) _because_ of its intermachine portability.  Your examples and the
  drift of discussion indicates that portability to Pascal, Fortran and
  Lisp is worthwhile.  Perhaps this is true in some cases, but it is just
  as applicable to rely on labeling for literals and constants in order to
  port the same C code, which is what we do here.

>available nor anything like it) would be rather silly, wouldn't it?

  For something other than C, yes.  Given the newsgroup, no.

>

  Various deleted about Ansi and X/Open.

>
>
   Comment about TCL deleted.

   In summary, using labels for string literals is a good thing, and can be
   done without "breaking internationalization" as was previously
   suggested.  

cbp
------
Recent conversation between Kurt Waldheim and Saddam Hussein:
  "Saddam, I *knew* Hitler, and believe me, you're no Adolf Hitler."