[Javascript] regexp - how to exclude a substring?

Paul Novitski paul at novitskisoftware.com
Sat May 21 16:51:47 CDT 2005


Shawn et al.,

I'm parsing some HTML using regular expressions but I'm stumped on one point:

I want to find a string that begins with "<div" and ends with "</div" that 
does not enclose a nested "</div".

I'm starting by locating a start & end tag pair:

	/<div.*>.*<\/div/si

[si = include newlines + case-insensitive]

I'm actually locating a specific tag using a regexp like this:

	/<div [^>]*id="target".*>.*<\/div/si

That finds my starting & closing tags, but if I've got multiple divs it 
finds everything up to & including the final </div on the page.

Therefore as my next step I need to know how to exclude "</div" from the 
innerHTML of the div.  I've tried (.*(<\/div){0}) but it doesn't seem to work.

1) How do I say "allow any number of any characters but don't allow this 
substring"?

2) The direction I'm headed is to be able to include all nested divs in my 
target div.  In other words, the range of selected text should include an 
even number of start & end tags of the same tagName as my target tag:

	<div id="target">
		<div>blah he blah</div>
		<div>blah he blah
			<div>blah he blah</div>
		</div>
	</div>

I figure that once I solve problem 1) I'll be able to assemble a regular 
expression that allows nested tags (<div...>...</div) at least to some 
reasonable level of nesting.  Any suggestions?

Thanks,
Paul





More information about the Javascript mailing list