why does -vi- set the hi bit when expanding `%' and `#'?

Guy Harris guy at auspex.UUCP
Thu Jan 12 09:04:06 AEST 1989


>Maybe I'm just thick, or maybe I was home sick the day they explained
>``shell-internal-quoting format'' to everyone, but would some kind
>soul who knows what Chris is talking about care to fill me in?

Inside most versions of the Bourne, C, and Korn shells (and maybe the V6
and PWB shells as well), strings containing quoted characters (yes,
"quoted" as in "protected either with double-quotes or single-quotes, or
with a backslash," so yes,

>Is this the same as quoting sh meta-characters with '\'?

it is the same) are represented by turning the 8th bit of a byte
containing a quoted character on.  "vi", in a rather slimy move, "knew"
that this was the case, and instead of using, say, backslashes or
single-quotes to quote characters in file names, it turned the 8th bit
of the bytes containing those characters on, under the assumption that

	1) the 8th bit would be passed through the shell intact

and

	2) would thus be interpreted as meaning the characters were
	   quoted.

Unfortunately, more recent versions of the Bourne and Korn shells do
*not* use the 8th bit for this purpose, because they support 8-bit
character sets.  As such, while 1) is true, 2) isn't.

>Is this something I need to care about beyond being curious?

It's useful to keep the "8th bit" convention in mind if you may be
working on a system whose shell uses it (older - pre S5R3 - Bourne
shells, older - pre-"ksh-i" Korn shells, and all currently-available
versions of the C shell that I know of), since you won't be able to use
8-bit character sets when typing commands to those shells.  If your OS
supports file names with 8 bit characters, for example, and a file with
such characters in its name is created, you may have trouble removing it
if you are using such a shell.

It's also useful to keep in mind that using the 8th bit in such a
fashion - or other fashions - interferes with support for 8-bit
character sets, such as the ISO 8859 character sets that include
accented characters for Western European languages other than English.



More information about the Comp.unix.questions mailing list