[thelist] PHP RegEx

Anthony Baratta anthony at baratta.com
Wed Jul 11 23:04:09 CDT 2007


Jon Molesa wrote:
> Wow, thank you for taking the time.
> 
>> Here's an option:
>>
>> (?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)
>>
>> Returns:
>>
>> group(0)  	www.Company.com" target="_new">Company
> 
> Can you explain to me why Group(0) is returned at all?  

I'll do my best. I'm just a basic hack with RegEx.

Here's the full RegEx again:

(?<=<a href="https?://)(.[^"]*)(" target="_new">)(.[^<]*)(?=</a>)

The first clause is: (?<=<a href="https?://) which will match

	<a href="http://
	or
	<a href="https://

But not retained that because it uses the look behind option ?<=.

The last clause is: (?=</a>) which will match

	</a>

but not retained is because it uses the look ahead option ?=.

So with that as your opening and closing clauses it will match anything 
within those two clauses, hence the first match [0] of:

	www.Company.com" target="_new">Company

The middle clause is broken into three sections for matching purposes:

	(.[^"]*)

Wild Card, match anything except a double quote, because that starts the 
next specific match. This returns match [1].

	(" target="_new">)

Match this text string specifically.  This is match [2].

	(.[^<]*)

Wild Card, match anything except a less than sign, because that starts 
the next specific match. This returns match [3].


The one thing to improve this RegEx is tweak it so that (" 
target="_new">) is not returned as part of the match groups. Not exactly 
sure how to do that.

You can study the following post I previously did for another RegEx, 
which uses the same pattern - but does not have the multiple matches in 
the middle clause.

http://lists.evolt.org/archive/Week-of-Mon-20070625/190594.html

Hope that helps.

P.S. Several good RegEx Sites and a recommended book:

A Tao of Regular Expressions
http://www.sitescooper.org/tao_regexps.html

The Premier website about Regular Expressions
http://www.regular-expressions.info/

Regular Expression Test Page
http://www.fileformat.info/tool/regex.htm

Mastering Regular Expressions, Second Edition
http://www.amazon.com/exec/obidos/ASIN/0596002890/




More information about the thelist mailing list