[thelist] ASP/RegExp

Jay Dorsey evolt at jaydorsey.com
Thu Oct 3 13:14:01 CDT 2002


Chris,

I'm a bit confused as to what you want exactly - do you want exactly one
match, or three?

If it's pulling the entire text (which I do believe it is), it's because the
regex is being greedy.

If you break down the regex (I would break it into three parts) and 'read'
it, you have:

<<wttoc>> reads "match me the text '<<wttoc>>' exactly"
([\n]|.)* reads "match me anything that is a return, OR any one character,
multiple times (matches any character, including a new line, or even
<</wttoc>>)"
<</wttoc>> reads "match me '<</wttoc>>' exactly)

In the example you ahve listed below, it will grab the first '<<wttoc>>',
the last '<</wttoc>>', and anything in between (which CAN include
'<<wttoc>>' and/or '<</wttoc>>' because those meet the 'any character,
including a new line' requirement).  Your regex, as I noted above, is being
greedy.

Try dropping off the final <</wttoc>> in your sample string and you'll see
how the regex stops it's match at the second <</wttoc>>.

A good example of this is stripping HTML tags:

teststring = 'this <b>is</b> a <i>test</i>'

sPattern = "<[^>]*>" will strip out tags properly, however
sPattern = "<.*>" will leave you with 'this ' (try it out)

In CF I know you're not able to put a group inside of a class such as
[^(<</wttoc>>)] (not even sure if it's valid regex syntax in any flavor to
be honest) - but you could always give it a shot.  If it works, your regex
would look like this:

sPattern = "<<wttoc>>[^(<</wttoc>>)]*<</wttoc>>"

Kind of a long shot - but like I said, I'm not sure the syntax supports it
to work as I think that it would.  Maybe there's a regex expert in the list
that could clue me in as to why groups don't work in classes, with a ^ (I
honestly would like to know).

Even using a quantifier on it such as (<<wttoc>>([\n]|.)*<</wttoc>>){1} will
fail I believe; it should still grab as much as it can (only one time
though).

Hope this helps clarify some of it

jay



-----Original Message-----
From: thelist-admin at lists.evolt.org
[mailto:thelist-admin at lists.evolt.org]On Behalf Of Chris Marsh
Sent: Thursday, October 03, 2002 1:58 PM
To: thelist at lists.evolt.org
Subject: RE: [thelist] ASP/RegExp


Jay

> I'm not terribly familiar w/ ASP's flavor of regex, as I
> mostly deal w/ ColdFusion, but I do believe a period inside
> of a class is a literal.
>
> Maybe try
>
> sPattern = "<<wttoc>>([\n]|.)*<</wttoc>>"

That's great, thanks! My next problem however, is that this matches the
following entire string:

<<wttoc>>Hello There<</wttoc>>
This is cool.
<<wttoc>>Hello There<</wttoc>>
This is cool.
<<wttoc>>Hello There<</wttoc>>

Whereas I want it match:

<<wttoc>>Hello There<</wttoc>>

three times. Any pointers?

Many thanks in advance.

Regards

Chris Marsh


--
For unsubscribe and other options, including
the Tip Harvester and archive of thelist go to:
http://lists.evolt.org Workers of the Web, evolt !




More information about the thelist mailing list