Hex escape for quoted multibyte character

Doug Gwyn gwyn at smoke.BRL.MIL
Wed Apr 26 17:57:09 AEST 1989


In article <101058 at sun.Eng.Sun.COM> kuro%shochu at Sun.COM (Teruhiko Kurosaka - Sun Intercon) writes:
>	char *the_multibyte_char="\x8eabcd";		/* I-1 */

No, other than the null-byte terminator there is just one char in that string.
Its value is implementation-dependent but is very likely either 0x8E or 0xCD.

>However, I noticed, the draft sometimes use the word "character" and
>"byte" interexchangably.

It always uses these terms interchangeably; the difference is merely one of
emphasis.  See their definitions in section 1.6.  Note also that "multibyte
character" is defined as a separate concept, and that the occurrence of the
word "character" in the phrase "multibyte character" is not covered by the
definition given for just "character".  This is an unfortunate property of
technical English, and perhaps we should have invented some other name for
"multibyte character", but nobody could think of an acceptable alternative.

>	char *the_multibyte_char="\x8e\xab\xcd";	/* I-2 */

Correct.  You could also simply place the Kanji or whatever character
directly between the " marks, although that would make your source code
less portable, since different implementations would interpret the bytes
in your multibyte source character in different ways, some of them perhaps
invalid syntactically.  (For example, one of the bytes might represent the
" mark in some other implementation.)

>	wchar_t *the_wide_char_str=L"\xbcde";		/* II-1 */

Correct.

>	whcar_t *the_wide_char_str=L"\xbc\xde";		/* II-2 */
[       wchar_t]

No, this string contains three distinct values: 0x00BC, 0x00DE, and 0x0000.

>	whcar_t the_wide_char=L'\xbcde';		/* III-1 */
[       wchar_t]

Correct, assuming you fix the typographic error as indicated.

>My personal choices are I-2, II-I and III-1.

The Standard agrees with you (or vice versa).



More information about the Comp.std.c mailing list