POSIX Regular Expression Funnyness

Ken Arnold arnold at apollo.COM
Fri Jan 27 02:20:00 AEST 1989


The POSIX proposal [] has a rework of regular expressions.  In
particular, the character set expresions (things like "[a-z]") have had
a few new things added, but they way they have been added seems passing
strange.  I was wondering if I was alone in thinking the following
suboptimal:

The have added a new set of bracket expressions which stand for
pre-defined sets of characters.  For example, "[:alpha:]" is all
alphabetic characters, "[.ch.]" is the character string ch treated as a
single character (which is useful for sorting in many languages), and
"[=a=]" refers to all variants of a, i.e., a, a with a circumflex, a
with an umlaut, etc.

Well, this sounds fine and dandy.  Being able to express C variables as
"[[:alpha:]_][[:alnum:]_]*" is reasonably descriptive.  Being able to
say "I don't care if the 'o' has any diacritical marks" is also fine.

The problem is that, for some reason, if you want to simply match any
alphabetic character, you *cannot* say "[:alpha:]".  Or, to be more
precise, that expression means exactly what it does now.  If you say

	grep "+[:alnum:]+" file ...

you will print any line which has a "+" followed by one of :, a, l, n,
u, or m, followed by another "+".  If you want to match what it *looks*
like that expression would match, you have to say.

	grep "+[[:alnum:]]+" file ...

In other words, these new bracket expressions only have their new
meaning inside outer brackets.

Why?  The only existing expressions you would break if you allowed "top
level" [::] expressions (or [..] or [==] expressions) would be
expressions which currently existed that contained *two* colons (or
dots or equals), on either side.  Since this is currently pointless
redundancy, I can't believe this is a serious problem.

What seems like a serious problem to me is that the required nesting
makes the new expressions more difficult to use.  Further, misuse of
them in this kind of obvious way leads to silent misbehavior from which
it is difficult to surmise the bug.

Is it just me, or is this wrong?

		Ken



More information about the Comp.unix.wizards mailing list