Regular expression question.

Tom Poage poage at sunny.UUCP
Fri Feb 24 04:06:14 AEST 1989


Is there a reason why I don't find regular expressions
with both alternation and explicit number of occurrence
declaration?  Here's what I mean ...

In some public-domain regexp routines I can use

	(string1|string2)

In other routines I can use

	(something){3,4}

However, I have never seen routines with the ability to use 
these two constructs together, such as

	(x|y|(z){4,5})

For example, I want to find strings of 9 digits occurring
in a certain pattern, similar to:

875000000-876000000,786992210,>789922119

The current (gnu) regexp routine I have requires the following to
match the above line.  The actual line has been split for 
demonstration purposes.

^((([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])|
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]))
(,(([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])|
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]))*$

The first problem is that this regexp overflows grep/egrep of
SunOS 3.5 (However Gnu's e?grep handles it just fine).  The 
second is that this is unwieldy.  The third is that I don't 
necessarily want to parse the line into fragments and perform
sub matches.

Why can't I do something like this (still split)?

^((([<>](=)?)?[0-9]{9})|([0-9]{9}-[0-9]{9}))(,(([<>](=)?)?[0-9]{9})|
([0-9]{9}-[0-9]{9}))*$

Don't you agree this is easier ":-):-):-)" to read?

Is this only a difference between System V and BSD variants?
Is there a public-domain version of regexp(3) with these 
features merged?  I await with bated breath.  Tom.
-- 
Tom Poage, UCDMC Clinical Engineering, Sacto., CA
poage at sunny.ucdavis.edu
...!ucbvax!ucdavis!sunny!poage



More information about the Comp.unix.questions mailing list