[Javascript] regexp - how to exclude a substring?
Paul Novitski
paul at novitskisoftware.com
Sat May 21 16:51:47 CDT 2005
Shawn et al.,
I'm parsing some HTML using regular expressions but I'm stumped on one point:
I want to find a string that begins with "<div" and ends with "</div" that
does not enclose a nested "</div".
I'm starting by locating a start & end tag pair:
/<div.*>.*<\/div/si
[si = include newlines + case-insensitive]
I'm actually locating a specific tag using a regexp like this:
/<div [^>]*id="target".*>.*<\/div/si
That finds my starting & closing tags, but if I've got multiple divs it
finds everything up to & including the final </div on the page.
Therefore as my next step I need to know how to exclude "</div" from the
innerHTML of the div. I've tried (.*(<\/div){0}) but it doesn't seem to work.
1) How do I say "allow any number of any characters but don't allow this
substring"?
2) The direction I'm headed is to be able to include all nested divs in my
target div. In other words, the range of selected text should include an
even number of start & end tags of the same tagName as my target tag:
<div id="target">
<div>blah he blah</div>
<div>blah he blah
<div>blah he blah</div>
</div>
</div>
I figure that once I solve problem 1) I'll be able to assemble a regular
expression that allows nested tags (<div...>...</div) at least to some
reasonable level of nesting. Any suggestions?
Thanks,
Paul
More information about the Javascript
mailing list