[thelist] PHP RegEx
Anthony Baratta
anthony at baratta.com
Wed Jul 11 23:04:09 CDT 2007
Jon Molesa wrote:
> Wow, thank you for taking the time.
>
>> Here's an option:
>>
>> (?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)
>>
>> Returns:
>>
>> group(0) www.Company.com" target="_new">Company
>
> Can you explain to me why Group(0) is returned at all?
I'll do my best. I'm just a basic hack with RegEx.
Here's the full RegEx again:
(?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)
The first clause is: (?<=<a href="https?://) which will match
<a href="http://
or
<a href="https://
But not retained that because it uses the look behind option ?<=.
The last clause is: (?=</a>) which will match
</a>
but not retained is because it uses the look ahead option ?=.
So with that as your opening and closing clauses it will match anything
within those two clauses, hence the first match [0] of:
www.Company.com" target="_new">Company
The middle clause is broken into three sections for matching purposes:
(.[^"]*)
Wild Card, match anything except a double quote, because that starts the
next specific match. This returns match [1].
(" target="_new">)
Match this text string specifically. This is match [2].
(.[^<]*)
Wild Card, match anything except a less than sign, because that starts
the next specific match. This returns match [3].
The one thing to improve this RegEx is tweak it so that ("
target="_new">) is not returned as part of the match groups. Not exactly
sure how to do that.
You can study the following post I previously did for another RegEx,
which uses the same pattern - but does not have the multiple matches in
the middle clause.
http://lists.evolt.org/archive/Week-of-Mon-20070625/190594.html
Hope that helps.
P.S. Several good RegEx Sites and a recommended book:
A Tao of Regular Expressions
http://www.sitescooper.org/tao_regexps.html
The Premier website about Regular Expressions
http://www.regular-expressions.info/
Regular Expression Test Page
http://www.fileformat.info/tool/regex.htm
Mastering Regular Expressions, Second Edition
http://www.amazon.com/exec/obidos/ASIN/0596002890/
More information about the thelist
mailing list