POSIX Regular Expression Funnyness

Donn Terry donn at hpfcdc.HP.COM
Tue Jan 31 10:50:05 AEST 1989


Ken Arnold's point about [[:alpha:]] is well taken.  I suspect that if
the proposal had been as he suggests that someone else would be saying
that [:alpha:] must mean :,a,l,p,h,a, with : specified twice, for backwards
compatability.  Maybe not, but in the standards business it's easy to get
paranoid because for practically any possibly controversial point, there's
at least 2**n (where n is the number of partipants) viewpoints before 
everything gets settled. (Well, maybe 2*n :-) ).

In Doug Gwyn's comments about [:ch:]  As far as character classes:
these are specified by the natural language involved.  My Spanish is
weak, but the *two characters* ch are treated as a single symbol with
its own place in the collating sequence.  c and h can also appear
independently, but when adjacent they are collated as another symbol.
This is arguably a kluge, but it antedates the computer business by a
few hundred years, and a few million users, so I doubt we can change it
just for the sake of aesthetics.

Remember, we (native-)speakers of English are awfully spoiled by having a
reasonably regular alphabet.  It's reasonable to ask what things would
have been like had computers had their initial development in, say,
China or Japan, where the alphabet problem is much worse.  I think the
simple model of English may have sped things up initially, but it's now
turning into an impediment for dealing with the rest of the world.
(Oh well, we make up for a simple alphabet with hideously irrational
spelling, even discounting the British/American differences :-) ).

Donn Terry
HP, Ft. Collins.



More information about the Comp.unix.wizards mailing list