[thelist] ASP/RegExp
Jay Dorsey
evolt at jaydorsey.com
Thu Oct 3 13:14:01 CDT 2002
Chris,
I'm a bit confused as to what you want exactly - do you want exactly one
match, or three?
If it's pulling the entire text (which I do believe it is), it's because the
regex is being greedy.
If you break down the regex (I would break it into three parts) and 'read'
it, you have:
<<wttoc>> reads "match me the text '<<wttoc>>' exactly"
([\n]|.)* reads "match me anything that is a return, OR any one character,
multiple times (matches any character, including a new line, or even
<</wttoc>>)"
<</wttoc>> reads "match me '<</wttoc>>' exactly)
In the example you ahve listed below, it will grab the first '<<wttoc>>',
the last '<</wttoc>>', and anything in between (which CAN include
'<<wttoc>>' and/or '<</wttoc>>' because those meet the 'any character,
including a new line' requirement). Your regex, as I noted above, is being
greedy.
Try dropping off the final <</wttoc>> in your sample string and you'll see
how the regex stops it's match at the second <</wttoc>>.
A good example of this is stripping HTML tags:
teststring = 'this <b>is</b> a <i>test</i>'
sPattern = "<[^>]*>" will strip out tags properly, however
sPattern = "<.*>" will leave you with 'this ' (try it out)
In CF I know you're not able to put a group inside of a class such as
[^(<</wttoc>>)] (not even sure if it's valid regex syntax in any flavor to
be honest) - but you could always give it a shot. If it works, your regex
would look like this:
sPattern = "<<wttoc>>[^(<</wttoc>>)]*<</wttoc>>"
Kind of a long shot - but like I said, I'm not sure the syntax supports it
to work as I think that it would. Maybe there's a regex expert in the list
that could clue me in as to why groups don't work in classes, with a ^ (I
honestly would like to know).
Even using a quantifier on it such as (<<wttoc>>([\n]|.)*<</wttoc>>){1} will
fail I believe; it should still grab as much as it can (only one time
though).
Hope this helps clarify some of it
jay
-----Original Message-----
From: thelist-admin at lists.evolt.org
[mailto:thelist-admin at lists.evolt.org]On Behalf Of Chris Marsh
Sent: Thursday, October 03, 2002 1:58 PM
To: thelist at lists.evolt.org
Subject: RE: [thelist] ASP/RegExp
Jay
> I'm not terribly familiar w/ ASP's flavor of regex, as I
> mostly deal w/ ColdFusion, but I do believe a period inside
> of a class is a literal.
>
> Maybe try
>
> sPattern = "<<wttoc>>([\n]|.)*<</wttoc>>"
That's great, thanks! My next problem however, is that this matches the
following entire string:
<<wttoc>>Hello There<</wttoc>>
This is cool.
<<wttoc>>Hello There<</wttoc>>
This is cool.
<<wttoc>>Hello There<</wttoc>>
Whereas I want it match:
<<wttoc>>Hello There<</wttoc>>
three times. Any pointers?
Many thanks in advance.
Regards
Chris Marsh
--
For unsubscribe and other options, including
the Tip Harvester and archive of thelist go to:
http://lists.evolt.org Workers of the Web, evolt !
More information about the thelist
mailing list