[thelist] Regular Expression Question

Sam-I-Am sam at sam-i-am.com
Thu Nov 13 10:25:25 CST 2003


> She said &quot;hello, world&quot; [<a target=&quot;_new&quot;
> href=&quot;link.html&quot;>1</a>]
> 
> Is translated to:
> 
> She said &quot;hi, there.&quot; [<a target="_blank"
> href="link.html">1</a>]
> 

hi Beau,
I'm also going to side-step your question. If you're using perl, I'd 
recommend using HTML::TokeParser (one of the HTML::Parser family of 
modules) which will raise events for each html tag (or text, comment, 
etc) encountered. For a start tag, you get passed the tagname, attribute 
hash, and original string. So you can focus on positive matches rather 
than trying to regexp your way through an entire document with all the 
possible exceptions.

Failing that maybe split it into 2 steps? Use /<([^>]+)>/ to get the 
contents of the tag, unescape the quotes and whatever else you want to 
do and write it back out.

hth
Sam



More information about the thelist mailing list