[thelist] PHP RegEx

Anthony Baratta anthony at baratta.com
Wed Jul 11 23:04:09 CDT 2007

Jon Molesa wrote:
> Wow, thank you for taking the time.
>> Here's an option:
>> (?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)
>> Returns:
>> group(0)  	www.Company.com" target="_new">Company
> Can you explain to me why Group(0) is returned at all?  

I'll do my best. I'm just a basic hack with RegEx.

Here's the full RegEx again:

(?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)

The first clause is: (?<=<a href="https?://) which will match

	<a href="http://
	<a href="https://

But not retained that because it uses the look behind option ?<=.

The last clause is: (?=</a>) which will match


but not retained is because it uses the look ahead option ?=.

So with that as your opening and closing clauses it will match anything 
within those two clauses, hence the first match [0] of:

	www.Company.com" target="_new">Company

The middle clause is broken into three sections for matching purposes:


Wild Card, match anything except a double quote, because that starts the 
next specific match. This returns match [1].

	(" target="_new">)

Match this text string specifically.  This is match [2].


Wild Card, match anything except a less than sign, because that starts 
the next specific match. This returns match [3].

The one thing to improve this RegEx is tweak it so that (" 
target="_new">) is not returned as part of the match groups. Not exactly 
sure how to do that.

You can study the following post I previously did for another RegEx, 
which uses the same pattern - but does not have the multiple matches in 
the middle clause.


Hope that helps.

P.S. Several good RegEx Sites and a recommended book:

A Tao of Regular Expressions

The Premier website about Regular Expressions

Regular Expression Test Page

Mastering Regular Expressions, Second Edition

More information about the thelist mailing list