[thelist] Regex help

Lee Kowalkowski lee.kowalkowski at googlemail.com
Thu May 3 05:10:24 CDT 2007


On 02/05/07, Tab Alleman <talleman at lumpsum.com> wrote:
> Except I was overly subtle in describing my string.  The string begins and ends with any combination of characters which could include START, so there might be more than two START's in the string.  I really need to specifcally grab instances of START that are followed by END before another START.

Phew, firstly I'm not sure what regEx syntax/flavour you want, but I
don't think ^ outside of [] will work as it means the beginning of
input.  Perhaps ! does it, (I think it does for mod_rewrite).

Your problem is one I have seen many times, and have not been able to
solve in a single regEx.  I tend to go for a serial regEx approach,
something like:

1. replace all START with \nSTART and END with END\n.
2. do global search for START(.*)END - works because '.' means
anything but the newline character.  Only need the brackets if you're
backreferencing without the START and END.  A global search should
return an array, so if you want the second instance, it will be the
second element of the array.

Fine for searching, if doing search/replace, you'll perhaps need:

3. replace all \nSTART with START and END\n with END.

If your source string already contains new lines and START and END can
be on different lines, you'll need to start with a replace all \n with
\x01 (or other character or combination of characters that you're
confident won't be present).  And swap the new lines back in
afterwards.

That's what I tend to do anyway...

-- 
Lee



More information about the thelist mailing list