[thelist] Excluding tags from a regular expression search
David Dorward
david at dorward.me.uk
Thu Sep 13 01:56:24 CDT 2007
On 12 Sep 2007, at 19:59, E Michael Brandt wrote:
> str=str.replace(/<[^>]+>/g,'');
Unfortunately, this fails when you get content such as:
<foo attribute="3 > 2" anotherAttribute="bar">
Parsing HTML is quite hard and not something I'd like to leave to
regular expressions. You'd probably be better off running the code
through a proper HTML parser that can give you plain text (there's no
shortage of HTML->Text converters, you can use Lynx if you get really
stuck) and storing that along side the markup (and then searching
that rather then the HTML).
--
David Dorward
http://dorward.me.uk/
http://blog.dorward.me.uk/
More information about the thelist
mailing list