POSIX Regular Expression Funnyness

Doug Gwyn gwyn at smoke.BRL.MIL
Fri Jan 27 10:05:48 AEST 1989


In article <4118f7b1.ae48 at apollo.COM> arnold at apollo.COM (Ken Arnold) writes:
>The POSIX proposal [] has a rework of regular expressions.  ...
>"[.ch.]" is the character string ch treated as a single character
>(which is useful for sorting in many languages), ...

This seems totally wrong to me.  The pattern argument should consist
of what ANSI C terms "multibyte characters", in which case no special
indicators are required to take care of this.  It looks like somebody
wants to pander to existing sick implementations of foreign character
sets instead of moving toward everybody doing it right (or at least,
the same way!).

>What seems like a serious problem to me is that the required nesting
>makes the new expressions more difficult to use.  Further, misuse of
>them in this kind of obvious way leads to silent misbehavior from which
>it is difficult to surmise the bug.

More interestingly, it's still not fully upward compatible, because
existing greps also already assign a meaning to '[[:alpha:]]'.  If an
incompatible change is to be made, best to engineer it carefully rather
than worry about preserving compatibility with existing practice when
that isn't going to be attained anyway.

I hope the 1003.2 guys are looking into Rob Pike's extended regular
expressions as used in "sam", or the ones the "gre" implementors have
come up with.  There is a LOT of existing practical experience that
should be drawn on.

The more pressing question is, why is a standard for this being
attempted if it's premature?



More information about the Comp.unix.wizards mailing list