[thelist] Excluding tags from a regular expression search
Bill Moseley
moseley at hank.org
Wed Sep 12 14:26:51 CDT 2007
On Wed, Sep 12, 2007 at 01:27:10PM -0500, will garrison wrote:
> First of all I'm pretty new to regular expressions, so please forgive
> my inexperience.
>
> Here's what I'm trying to accomplish:
>
> 1. user inputs a search term
> 2. using provided search term, search through a block of XHTML for matches.
> 3. highlight the matches
You need to know the context of a string before making the
substitutions. I think this gets complex trying to parse with regular
expressions in this way.
I've written a highlighter before and I used a parser (IIRC, libxml2)
to only operate on the text nodes. I think I used the SAX parser so I
could maintain state across nodes.
Get's tricky, too. If someone searches for "programmer" what do you
do if the HTML is:
<strong>prog</strong>ammer
I think that's why I used the SAX parser so I wouldn't break the tag
nesting and could highlight terms that might be in two nodes. I also
had the requirement to highlight phrases, too.
--
Bill Moseley
moseley at hank.org
More information about the thelist
mailing list