Internationalisation (was: NULL as a string terminator)

Fri Aug 24 16:42:03 AEST 1990

In article <3603 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1881 at jura.tcom.stc.co.uk>, rmj at tcom.stc.co.uk (Rhodri James) writes:
>> In article <3585 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>> }For why?  Internationalisation, _that's_ for why.
>
>> I cringe when I see this (unwords like "internationalisation", I mean).
>
>One uses language for the purpose of communication.

  etc., deleted.

>
>> Also I fail to see your point. Surely such #ifdef switching
>> as above is more efficient, simpler to maintain and more legible than
>> the scrabbling about with resource files you prefer?
>
>So now Cn James reads minds and knows what I prefer.  Wonderful just.
>No, it is *not* simpler to maintain.  The point of the resource file
>approach (not my invention by any means; no-hopers like IBM, DEC, HP,
>X/Open, AT&T, Apple, ... have been using it for a while and I just
>copied the idea and simplified it a bit for this newsgroup) is that
>you have all the text in one place; you don't have to go "scrabbling
>about" in the source files to find all the strings.  You can give the
>resource file to a human translator who knows nothing about the
>programming language you are using.  A minor addition to such a tool
>(have it generate
>	INTEGER MSGNO
>	PARAMETER (MSGNO=......
>instead of #defines) will let you use the *same* message file with a
>Fortran program.  Speaking as a no-hoper, I must admit that using a
>technique that adapts to *all* the programming languages I use, not
>just C, sounds like a saving.  But what do I know?

   Indeed, an interesting proposition.  There are two immediate (I am
   sure the creative will have more still) ways that will work with 
   internationalization while using labels and allow both extraction tools 
   to work and are simple to implement and preven the repetitious use of 
   literals and constants.  Here goes:

If you reaaaaaly want the text in the source section (incidentally, xscc on
System V [your original example] does invoke the C preprocessor, so text
substitution is absolutely not broken under MNLS, and any extractor that
does not invoke the preprocessor should be considered broken) -

#define DOS_DCOMM_MSG	1
#define UNIX_DCOMM_MSG	2
#define DEF_DCOMM_MSG	3

#if DOS
#define DCOM_ERR_MSG	DOS_DCOMM_MSG
#elif UNIX
#define DCOM_ERR_MSG	UNIX_DCOMM_MSG
#else
#define DCOM_ERR_MSG	DEF_DCOMM_MSG
#endif

#define DCOM_ERR	getmsg(DCOM_ERR_MSG)

/* tools.c */

char * 
getmsg(ErrMsg)
int ErrMsg;
{
	switch (ErrMsg){
		case DOS_DCOMM_MSG:
			return "Run dcom.exe";
		case UNIX_DCOMM_MSG:
			return "Datacomm not initialized, contact S/A";
		case DEF_DCOMM_MSG:
			return "Datacomm not running";
		default:
			return "Run for cover, they're commin' to get us";
}
/* somefile.c */

int 
CheckDatacomm()
{
	int RetVal;

	if ( (RetVal=DataCommRunning()) != 0)
		(void) fprintf(stderr,"%s\n",DCOM_ERR);

	return RetVal;
}

/* Makefile */

LANG = de fr sw gr

neatunix: main.o somefile.o tools.o 
	xscc -O main.o somefile.o tools.o -o neatunix 
	@for i in $(LANG); do gencat $@.X  $i.cat

neatdos: main.o somefile.o tools.o 
	xscc -O main.o somefile.o tools.o -o neatdos 
	@dosomethingelsealtogether

Another method would be to do something like the following (assuming that
you are invoking the C preprocessor):

#define DCOM_ERR	0
#define DRVR_ERR	1  /* etc. etc. */

char *ErrMsg[]={
#if DOS
				"Run dcom.com",
				"Run driver.com",
#elif UNIX
				"Datacomm not initialized, contact S/A",
				"Driver error, contact S/A",
#else
				"Datacomm not running",
				"Driver not responding",
#endif
};

#define MSG_ERR_DCOM	ErrMsg[DCOM_ERR]
#define MSG_ERR_DRVR		ErrMsg[DRVR_ERR]

int
foo()
{
	int Dcm, Dvr;
	.
	.
	.
	if (!Dcom())
		printf("%s",MSG_ERR_DCOM);

	if ( SomeDriverCheck() == FAILURE)
		printf("%s",MSG_ERR_DRVR);
	.
	.
	.
	return somevalue_etc;
}		

So, we have accomplished coding for purposes of internationalization,
either way, we have separated string literals to a central place,
and we have made the code more maintainable, since changes in messages for
the environment can occure at one major juncture, and life is a cabaret.

(BTW, all the above just got retyped in a max speed, so errors are surely
there and to be expected, the point remains).

>
>As for efficiency, the point is that we are talking about a scheme for
>generating messages for display to humans.  The cost of fishing the text
>out of a file is (or was every time I measured it) considerably less than
>the cost of displaying it on the terminal.

   Considering the program that pays no concern for "internationalization" 
   does not have to source anything external to it's data segment at any 
   time other than normal operations, to say that the additional overhead is 
   equal to or less than existing overhead is a non-sequitor.  If you 
   don't do it the cost ain't there.

>
>The real schemes (such as the X/Open one) identify messages by numbers,
>not by address in the text file.  That has the disadvantage that finding
>the right text is a wee bit more complex (but not very; one need merely
>attaches a directory at the end of the file), but it has the great
>advantage that the program does not need to be recompiled.  This means
>that one customer can be running the program with messages coming from
>the "English-speaking idiot" message file and another with messages
>coming from the "Spanish-speaking wizard" message file, and both can be
>sharing the same copy of the program without any recompilation at all.

   like MNLS, perhaps?

>
>That's the way it *is* in UNIX System V Release 4.  We might as well get
>used to thinking about messages in that way now.

  and it is not such a horrible thing.  Just think, we can pop streams
  modules for the simple stuff, and run extractors and programs to modify
  the source for multibyte character sets, and use different curses
  libraries for right to left output.  What a treasure.

  It has been pointed out here by several that are in the know on these
  things, that arguing about string literals is moot in comparison to other
  inherent difficulties presented by internationalization, and that the
  necessary crusade to "C programming practices" is long a commin'.

  For instance, I am told that the following is a problem in Kanji

  char p[10]; /* xscc provides for allowing twenty bytes as needed in Kanji */

  *(p+1)='x'; /* this is the next byte, and an error */ 
   p[n+1]='x'; /* this is the next _character_ and ok */

   Given trivial differences like this, I am sure that there are many
   things "broken" for internationalization, and we should all prepare to
   cringe; however, substitution for string literals and constants is not
   one of them.

>
>> Demonstrate to me a negative impact on internationalisation (ugh) and I
>> might believe you.  Any negative impact will do, I'm not too choosy.
>
>The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS,
>Ultrix), AT&T (SVR4) and others essentially add another couple of layers
>of indirection above what I presented.  Those systems all allow you to
>switch languages at run time, without any recompilation.  Those systems
>all allow you to translate message files without having any other access
>to the sources.  They all allow many programs, and many programming
>languages, to share the same message files.  They all allow a customer
>to substitute his own translation of a message file (perhaps amplifying
>some messages, or getting the grammar right, or ...) without access to
>the sources.

   And still can.  xscc in Unix System V (your example) does all of this
   for you.  You need not make the resource catalogues.  It is done
   for you.  

>
>There's four negative impacts of the #ifdef approach, just for starters.

  Given the above examples, do you still feel this to be the case?

  I do not think so.  I also believe that this shows that it is an unsafe
  practice to say that something cannot be done within the framework of C
  and the C preprocessor. 

cbp
--------
Of course these are opinions.