[thelist] Regular Expression Question

Burhan Khalid thelist at meidomus.com
Sun Nov 9 01:16:46 CST 2003


Beau Hartshorne wrote:
> I am trying to write a regular expression (using PHP's preg_replace())
> that will take a string like this:
> 
> She said &quot;hello, world&quot; [<a target=&quot;_new&quot;
> href=&quot;link.html&quot;>1</a>]
> 
> And turn it into a string like this:
> 
> She said &quot;hi, there.&quot; [<a target"_blank"
> href="link.html">1</a>]
> 
> Basically, I only want to convert the quotes that sit between < and >.
> One expression I've come up with would work if there were only ever one
> attribute, but sometimes there are several, as in an img tag. The
> closest I've come is this:
> 
> $pattern = '/=&quot;(((?!&quot;).)*)&quot;/i';
> $replacement = '="\\1"';
> $string = preg_replace($pattern,$replacement,$string);
> 
> This will only match =&quot;anytext&quot; (note the equals sign). It
> will match all of the HTML attributes but will probably not match
> anything else. I guess I have two questions:
> 
> 1. Is an unencoded quote (") OK to use in HTML text outside of a tag?
> (If this is the case, I'll just do a $string =
> str_replace('&quot;','"',$string);.)
> 
> 2. Can anyone tell me how to re-write the regex so that it *only* makes
> changes to the &quot;s that sit inside <>?

I don't try to do regex ... I'm still learning it myself. However, I am 
curious why you aren't using html_entity_decode()?

(slightly modified manual example):

$orig = "I'll &quot;walk&quot; the <b>dog</b> now";
$b = html_entity_decode($orig);
echo $b;
echo "\n";

This gives me

I'll "walk" the <b>dog</b> now

-- 
Burhan Khalid
thelist[at]meidomus[dot]com
http://www.meidomus.com
-----------------------
"Documentation is like sex: when it is good,
  it is very, very good; and when it is bad,
  it is better than nothing."



More information about the thelist mailing list