[Javascript] regexp - how to exclude a substring?

Shawn Milo shawn.milo at gmail.com
Sat May 21 19:32:16 CDT 2005


Pseudocode regarding my previous e-mail:

var arrayPieces = new Array();


//instr will match the first occurance remaining in the string,
//which should, by rights, also capture the opening tag.
//One conflict here is how to handle nested divs, another
//is handling a string with multiple lines. Perhaps the entire 
//string could have each \n replaced to handle the second issue.
//As for the first, depending on what type of parsing you're doing,
//a counter for the number of open divs (+= 1 for each <div>, -=1 for
each </div>
//might do the trick.
while (instr(someHTML, '</div>'){

 arrayPieces[arrayPieces.length] = substr(someHTML, instr(someHTML, '</div>'));
someHTML = someHTML.substr(instr(someHTML, '</div>'));

}

// Now, loop through the array and do whatever parsing task is required.


On 5/21/05, Shawn Milo <shawn.milo at gmail.com> wrote:
> One non-regex possibility would be to split the block of text into an
> array at every '>' character, or maybe use a loop to find and split at
> every '</div>.'
> 
> I believe that regex lookaheads and lookbehinds are not supported in
> Javascript.
> 
> Of course, this is just an 'off the top of my head' idea, and it may
> not hold water. I'll think on it some more, and see if I can come up
> with something more tangible and elegant. I hope you don't need this
> immediately, as my wife and I shall be departing shortly to meet the
> future wife of a friend.
> 
> Shawn
> 
> On 5/21/05, Paul Novitski <paul at novitskisoftware.com> wrote:
> > Shawn et al.,
> >
> > I'm parsing some HTML using regular expressions but I'm stumped on one point:
> >
> > I want to find a string that begins with "<div" and ends with "</div" that
> > does not enclose a nested "</div".
> >
> > I'm starting by locating a start & end tag pair:
> >
> >         /<div.*>.*<\/div/si
> >
> > [si = include newlines + case-insensitive]
> >
> > I'm actually locating a specific tag using a regexp like this:
> >
> >         /<div [^>]*id="target".*>.*<\/div/si
> >
> > That finds my starting & closing tags, but if I've got multiple divs it
> > finds everything up to & including the final </div on the page.
> >
> > Therefore as my next step I need to know how to exclude "</div" from the
> > innerHTML of the div.  I've tried (.*(<\/div){0}) but it doesn't seem to work.
> >
> > 1) How do I say "allow any number of any characters but don't allow this
> > substring"?
> >
> > 2) The direction I'm headed is to be able to include all nested divs in my
> > target div.  In other words, the range of selected text should include an
> > even number of start & end tags of the same tagName as my target tag:
> >
> >         <div id="target">
> >                 <div>blah he blah</div>
> >                 <div>blah he blah
> >                         <div>blah he blah</div>
> >                 </div>
> >         </div>
> >
> > I figure that once I solve problem 1) I'll be able to assemble a regular
> > expression that allows nested tags (<div...>...</div) at least to some
> > reasonable level of nesting.  Any suggestions?
> >
> > Thanks,
> > Paul
> >
> >
> > _______________________________________________
> > Javascript mailing list
> > Javascript at LaTech.edu
> > https://lists.LaTech.edu/mailman/listinfo/javascript
> >
> 
> 
> --
> Voicemail any time at:
> 206-666-MILO
> 


-- 
Voicemail any time at:
206-666-MILO



More information about the Javascript mailing list