[thelist] PHP RegEx

Jon Molesa rjmolesa at consoltec.net
Wed Jul 11 12:40:27 CDT 2007


Wow, those line breaks look terrible.  One more try.

The html I'm parsing looks like:

<font face="Verdana, Arial, Helvetica" size="-2" class="small">
	<a href="http://www.Company.com" target="_new">Company</a>
</font>

The <a> is optional so 0 or more.  But the company name will always be
there.

I don't really have a problem with my regex as it is working as is, but
I'm sure it could be improved upon.  My question really is why does:

$pattern ='/(?:[.*]*class="small">[\s\n]*)<a\shref="(?:http|https)+(?::\/\/){1}(?P<domain>.*\..*\.(?:com|net|edu|biz|org|info|name|us|cc|tv|gov){1})(?:.*)"\starget="_new">(?P<bizname>.*)<\/a>/';

if(preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER))
	print_r($matches);
}

return this:

Array
(
	[0] => Array
	(
		[0] => class="small">[RETURNED WHITE SPACE REMOVED]<a href="http://www.Company.com" target="_new">Company</a>
		[domain] => www.Company.com
		[1] => www.Company.com
		[bizname] => Company
		[2] => Company
	)
}




More information about the thelist mailing list