[Javascript] regexp - how to exclude a substring?

Paul Novitski paul at novitskisoftware.com
Sat May 21 20:01:10 CDT 2005


Shawn,

Thanks very much for taking the time to help.  I blush to confess that it's 
not actually JavaScript I'm using but PHP -- I took the liberty of posting 
my question to this list because I don't know where else to get such expert 
advice on regular expressions... although the logic I'm working with could 
just as easily be implemented in js, in fact with little modification of 
the PHP syntax.

Pending the discovery of the right piece of regexp to do my parsing for me, 
I've used a temporary solution that's very much like what you suggest: I 
split the HTML into an array at every < (not >) so that each array element 
begins with either DIV or /DIV; then I walk the array beginning with my 
target element, incrementing a counter with each new DIV and decrementing 
it with each /DIV and breaking from the loop when it reaches zero, meaning 
that it's found the closing tag for the target element.  Then I splice 
those array elements back together using '<' and voila I've got the 
equivalent of outerHTML based on TAG#ID.

The purpose of this is to extract segments of an HTML template for 
selective processing.

I'm currently contemplating how to process a complex CSS-style 
selector.  For the moment, templateGetElementById(tagName, id) is a good start.

You say:
>I believe that regex lookaheads and lookbehinds are not supported in
>Javascript.

Why would such be necessary in order to determine whether a matched pair of 
<div/</div occurred within a string?  It seems like what I really need to 
know is how to say in regexp, "match this string if it contains any 
character but NOT the substring "</TAG"".  With that tool, I can filter for 
nested TAGs inside my parent TAG.

Regards,
Paul


At 05:24 PM 5/21/2005, Shawn Milo wrote:
>One non-regex possibility would be to split the block of text into an
>array at every '>' character, or maybe use a loop to find and split at
>every '</div>.'
>
>I believe that regex lookaheads and lookbehinds are not supported in
>Javascript.
>
>Of course, this is just an 'off the top of my head' idea, and it may
>not hold water. I'll think on it some more, and see if I can come up
>with something more tangible and elegant. I hope you don't need this
>immediately, as my wife and I shall be departing shortly to meet the
>future wife of a friend.
>
>Shawn
>
>On 5/21/05, Paul Novitski <paul at novitskisoftware.com> wrote:
> > Shawn et al.,
> >
> > I'm parsing some HTML using regular expressions but I'm stumped on one 
> point:
> >
> > I want to find a string that begins with "<div" and ends with "</div" that
> > does not enclose a nested "</div".
> >
> > I'm starting by locating a start & end tag pair:
> >
> >         /<div.*>.*<\/div/si
> >
> > [si = include newlines + case-insensitive]
> >
> > I'm actually locating a specific tag using a regexp like this:
> >
> >         /<div [^>]*id="target".*>.*<\/div/si
> >
> > That finds my starting & closing tags, but if I've got multiple divs it
> > finds everything up to & including the final </div on the page.
> >
> > Therefore as my next step I need to know how to exclude "</div" from the
> > innerHTML of the div.  I've tried (.*(<\/div){0}) but it doesn't seem 
> to work.
> >
> > 1) How do I say "allow any number of any characters but don't allow this
> > substring"?
> >
> > 2) The direction I'm headed is to be able to include all nested divs in my
> > target div.  In other words, the range of selected text should include an
> > even number of start & end tags of the same tagName as my target tag:
> >
> >         <div id="target">
> >                 <div>blah he blah</div>
> >                 <div>blah he blah
> >                         <div>blah he blah</div>
> >                 </div>
> >         </div>
> >
> > I figure that once I solve problem 1) I'll be able to assemble a regular
> > expression that allows nested tags (<div...>...</div) at least to some
> > reasonable level of nesting.  Any suggestions?
> >
> > Thanks,
> > Paul
> >
> >
> > _______________________________________________
> > Javascript mailing list
> > Javascript at LaTech.edu
> > https://lists.LaTech.edu/mailman/listinfo/javascript
> >
>
>
>--
>Voicemail any time at:
>206-666-MILO
>_______________________________________________
>Javascript mailing list
>Javascript at LaTech.edu
>https://lists.LaTech.edu/mailman/listinfo/javascript





More information about the Javascript mailing list