[thelist] Excluding tags from a regular expression search

David Dorward david at dorward.me.uk
Thu Sep 13 01:56:24 CDT 2007

On 12 Sep 2007, at 19:59, E Michael Brandt wrote:
> str=str.replace(/<[^>]+>/g,'');

Unfortunately, this fails when you get content such as:

<foo attribute="3 > 2" anotherAttribute="bar">

Parsing HTML is quite hard and not something I'd like to leave to  
regular expressions. You'd probably be better off running the code  
through a proper HTML parser that can give you plain text (there's no  
shortage of HTML->Text converters, you can use Lynx if you get really  
stuck) and storing that along side the markup (and then searching  
that rather then the HTML).

David Dorward

More information about the thelist mailing list