[thelist] howto RegularExpression in PHP

Geoffrey Sneddon foolistbar at googlemail.com
Mon Jan 8 10:14:41 CST 2007


On 8 Jan 2007, at 07:12, Max Schwanekamp wrote:

> S. F. Alim wrote:
>> I need help with this function `eregi_replace()` in PHP4.4.4. Well  
>> actually
>> I need help in forming proper regex so I can remove `<img . />`  
>> image tag of
>> html which is coming from database. All I want is to remove this  
>> tag and all
>> its attributes.
>
> If you're using regex in PHP, you're better off using the PCRE library
> (preg_match() and friends).  The POSIX Extended regex functions
> (eregi...()) are slower and less useful.
>
> Using preg_replace(), this should do it:
>
> $str = 'text text <img src="evil.gif" /> text text';
> echo preg_replace('/<img[^>]*>/iU','',$str);
> //outputs text text  text text

There's a bug in that just shouting at me, try running it on:

$str = 'text text <img title=" /> " src="evil.gif" /> text text'; //  
and yes, that is valid HTML
echo preg_replace('/<img[^>]*>/iU','',$str);
// outputs text text  " src="evil.gif" /> text text

For HTML, you need something like…

echo preg_replace('/<img((\s*(([^\s:]+:)?[^\s:]+)(\s*=\s*("([^"]*)"| 
\'([^\']*)\'|([a-z0-9\-._:]*)))?)*)\s*(\/)?>/i', '', $str);

For XML, you can go with something shorter like…

echo preg_replace('/<img((\s*(([^\s:]+:)?[^\s:]+)\s*=\s*("([^"]*)"| 
\'([^\']*)\'))*)\s*\/>/', '', $str);

If anyone finds any bugs in either, please don't hesitate to let me  
know.

- Geoffrey Sneddon





More information about the thelist mailing list