what should egrep '|root' /etc/passwd print?

Larry Wall lwall at jpl-devvax.JPL.NASA.GOV
Tue Sep 20 08:49:58 AEST 1988


In article <5060 at watdcsu.waterloo.edu> dmcanzi at watdcsu.waterloo.edu (David Canzi) writes:
: As for utility, consider the case, which I have actually run into,
: where I wanted an expression like 'aa(|bb)cc' to match the strings
: 'aacc' and 'aabbcc'.  In this case, it's clear I want the expression
: in parentheses to match the null string.  The program I was using
: wouldn't let me do this, and I had to use something like 'a(a|abb)cc'
: to get what I wanted.  If I had had a program generate that expression,
: I would have had to add code to detect this special case and rewrite
: the regular expression.  Yecch.

Interestingly enough, in Henry Spencer's regexp routines (which I borrowed
for perl), if you say /aa(bb)?cc/, it gets translated internally to
the equivalent /aa(bb|)cc/.

The null string should match anything because the whole idea of regular
expressions involves rejecting strings that you can't match.  To match /abc/,
you say "For each of the next N characters, bomb out if it doesn't match.
Otherwise it matches."  You don't go and change the rules just because N
happens to be 0 sometimes.

If you DO change the rules on boundary conditions, people who write program
generators will hate you forever, as David mentioned.  I know, I've been
there.  "Whaddya mean, I can't declare an array of size 0?"

Or look at it another way.  As the pattern gets shorter and shorter,
it matches more and more things.  When it gets as short as it can,
it ought to match as many things as it can, by the Principle of Least
Surprise.

Let's hear it for intuitionalization.

Larry Wall
lwall at jpl-devvax.jpl.nasa.gov



More information about the Comp.unix.wizards mailing list