[thelist] Regexp help?

Dougal Campbell dougal at gunters.org
Fri Sep 19 11:05:46 CDT 2003


I've been searching for a regexp that will let me do text replacements
inside HTML text, but not within HTML tags themselves. In other words,
if I had content like this:

  <a href="/meta/RSS.xml" title="My RSS feed">Our RSS 2.0 feed</a>

I want to be able to modify the text 'RSS' without breaking the link or
the title attribute.  Some of you might be tempted to suggest that I use
whitespace or '\b' word boundary assertations, but some of my
replacements are too complicated for that (as in the 'title' attr
above). I actually need the regexp to match on text that its not inside
<> pairs. I was trying a negative look-ahead technique, but couldn't get
it quite right. Any suggestions?

<tip type="PHP">
When doing lots of different text replacements using regular
expressions, take advantage of the preg_replace() function's ability to
accept arrays as arguments:

  // A bunch of regexp replacements we want to make:
  $patterns = array(
                '/\bRSS\b/' => 'Really Simple Syndication',
                '/\bXML\b/' => 'eXtensible Markup Language',
                '/\bHTML\b/' => 'Hypertext Markup Langage'
                // etc.
  );

  // Extract the subjects and replacments separately:
  $keys = array_keys($patterns);
  $values = array_values($patterns);

  // Perform all the substitutions:
  $content = preg_replace($keys,$values,$content);

</tip>

-- 
Ernest MacDougal Campbell III, MCP+I, MCSE <dougal at gunters.org>
http://dougal.gunters.org/             http://spam.gunters.org/
  Web Design & Development:  http://www.mentalcollective.com/
       This message is guaranteed to be 100% eror frea!


More information about the thelist mailing list