[thelist] Excluding tags from a regular expression search

Bill Moseley moseley at hank.org
Wed Sep 12 14:26:51 CDT 2007


On Wed, Sep 12, 2007 at 01:27:10PM -0500, will garrison wrote:
> First of all I'm pretty new to regular expressions, so please forgive
> my inexperience.
> 
> Here's what I'm trying to accomplish:
> 
> 1. user inputs a search term
> 2. using provided search term, search through a block of XHTML for matches.
> 3. highlight the matches

You need to know the context of a string before making the
substitutions.  I think this gets complex trying to parse with regular
expressions in this way.

I've written a highlighter before and I used a parser (IIRC, libxml2)
to only operate on the text nodes.  I think I used the SAX parser so I
could maintain state across nodes.

Get's tricky, too.  If someone searches for "programmer" what do you
do if the HTML is:

    <strong>prog</strong>ammer

I think that's why I used the SAX parser so I wouldn't break the tag
nesting and could highlight terms that might be in two nodes.  I also
had the requirement to highlight phrases, too.


-- 
Bill Moseley
moseley at hank.org




More information about the thelist mailing list