why does -vi- set the hi bit when expanding `%' and `#'?

Dominic Dunlop domo at riddle.UUCP
Mon Jan 16 20:09:39 AEST 1989


[Already it's hard to keep track of who's quoting whom in this thread.
Sorry if I've got it wrong...]

In article <450 at oglvee.UUCP> norm at oglvee.UUCP (Norman Joseph) writes:
[Stuff about vi setting the high bit of each character in the filenames it
produces when expanding `%' an `#' on shell command lines omited.]
>In article <15219 at mimsy.UUCP>, chris at mimsy.UUCP (Chris Torek) writes:
>> vi believes that by setting bit 7, it is quoting the file name,
>> so that if you are editing the file `foo*bar.c', the command
>> 
>> 	!echo %
>> 
>> produces [in effect]
>> 
>> 	!echo \f\o\o\*\b\a\r\.\c
>> 
>> in shell-internal-quoting format (bit 7 set).
>
>Maybe I'm just thick, or maybe I was home sick the day they explained
>``shell-internal-quoting format'' to everyone, but would some kind
>soul who knows what Chris is talking about care to fill me in? (E-mail
>would be fine.  I'm sure people are falling asleep even as we speak :^).
>Is this the same as quoting sh meta-characters with '\'?>
                                ^^^^
Yes, except that, strictly, the backslash can be used to quote any character:
it's just that the quoting is a no-op on any character other than a
metacharacter.  (Yes, this topic has scope for soporific semantic pedantry.)

>Is this
>something I need to care about beyond being curious?

No.  Apart from anything else, it's obsolescent, and its use by
applications software has been deprecated for A Long Time (this deprecation
having been broadcast in the same way as information about the `feature'
itself -- that is, by word of mouth).  As I understand it, we finally get
to say goodbye to bit seven internal quoting with the System V, release 4
version of the shell.  It's possible that it's been eliminated in V.3.1 and
later as well.  Comments, anybody?

Why has it gone?  Because it's a real pain in the butt for users of
character sets which require all eight bits of a byte in order to represent
all alphabetic characters.  This turns out to mean most Europeans.
(Asian character sets are something else again.)  Having the shell
interpret that eighth bit as a quote, then clear it, mangles text which
includes characters (usually accented letters) which ANSI didn't think
of all those years ago.

The 1003.2 working group of the IEEE is drafting a standard for the shell
command language.  I don't have it to hand, but, as I recall, it
effectively outlaws eighth bit quoting in the shell.



More information about the Comp.unix.questions mailing list