[thelist] RegExp: extracting a substring

Kelly Hallman khallman at wrack.org
Thu Sep 11 18:09:31 CDT 2003


On Thu, 11 Sep 2003, Tab Alleman wrote:
> For some reason it didn't like the "\b?" in the middle of the
> "[0-9]{1,2}\b?[A-Z]{3}" part, but "\s?" is ok.

\b is a word boundary anchor, similar to anchors like ^ and $
\b matches a position between a \w and a \W

/[0-9]{1,2}\b?[A-Z]{3}/
is incorrect because because it would match something like
02BEK but never 52 KLW because there is no accounting for the whitespace.

There would never be a boundary between 64LIR because they're all \w
\b? would still allow matches if it were ##XXX since ? makes it optional

/[0-9]{1,2}\b\s+\b[A-Z]{3}/
would be how you would do it probably, but the \b's are superfluous

> The pattern that finally worked was:  "^PER NIGHT STARTING
> [0-9]{1,2}\s?[A-Z]{3} FOR (\d{1,2}) NIGHTS?$"

The regex you ended up with will match both 24LER and 62 IEP.
Also be aware that if it was 001 nights, no match would be made.
It'd probably be safe to use (\d+)

-- 
Kelly Hallman
http://wrack.org/




More information about the thelist mailing list