[thelist] PHP RegEx
Jon Molesa
rjmolesa at consoltec.net
Wed Jul 11 12:40:27 CDT 2007
Wow, those line breaks look terrible. One more try.
The html I'm parsing looks like:
<font face="Verdana, Arial, Helvetica" size="-2" class="small">
<a href="http://www.Company.com" target="_new">Company</a>
</font>
The <a> is optional so 0 or more. But the company name will always be
there.
I don't really have a problem with my regex as it is working as is, but
I'm sure it could be improved upon. My question really is why does:
$pattern ='/(?:[.*]*class="small">[\s\n]*)<a\shref="(?:http|https)+(?::\/\/){1}(?P<domain>.*\..*\.(?:com|net|edu|biz|org|info|name|us|cc|tv|gov){1})(?:.*)"\starget="_new">(?P<bizname>.*)<\/a>/';
if(preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER))
print_r($matches);
}
return this:
Array
(
[0] => Array
(
[0] => class="small">[RETURNED WHITE SPACE REMOVED]<a href="http://www.Company.com" target="_new">Company</a>
[domain] => www.Company.com
[1] => www.Company.com
[bizname] => Company
[2] => Company
)
}
More information about the thelist
mailing list