[Javascript] regexp - how to exclude a substring?
Shawn Milo
shawn.milo at gmail.com
Sat May 21 19:32:16 CDT 2005
Pseudocode regarding my previous e-mail:
var arrayPieces = new Array();
//instr will match the first occurance remaining in the string,
//which should, by rights, also capture the opening tag.
//One conflict here is how to handle nested divs, another
//is handling a string with multiple lines. Perhaps the entire
//string could have each \n replaced to handle the second issue.
//As for the first, depending on what type of parsing you're doing,
//a counter for the number of open divs (+= 1 for each <div>, -=1 for
each </div>
//might do the trick.
while (instr(someHTML, '</div>'){
arrayPieces[arrayPieces.length] = substr(someHTML, instr(someHTML, '</div>'));
someHTML = someHTML.substr(instr(someHTML, '</div>'));
}
// Now, loop through the array and do whatever parsing task is required.
On 5/21/05, Shawn Milo <shawn.milo at gmail.com> wrote:
> One non-regex possibility would be to split the block of text into an
> array at every '>' character, or maybe use a loop to find and split at
> every '</div>.'
>
> I believe that regex lookaheads and lookbehinds are not supported in
> Javascript.
>
> Of course, this is just an 'off the top of my head' idea, and it may
> not hold water. I'll think on it some more, and see if I can come up
> with something more tangible and elegant. I hope you don't need this
> immediately, as my wife and I shall be departing shortly to meet the
> future wife of a friend.
>
> Shawn
>
> On 5/21/05, Paul Novitski <paul at novitskisoftware.com> wrote:
> > Shawn et al.,
> >
> > I'm parsing some HTML using regular expressions but I'm stumped on one point:
> >
> > I want to find a string that begins with "<div" and ends with "</div" that
> > does not enclose a nested "</div".
> >
> > I'm starting by locating a start & end tag pair:
> >
> > /<div.*>.*<\/div/si
> >
> > [si = include newlines + case-insensitive]
> >
> > I'm actually locating a specific tag using a regexp like this:
> >
> > /<div [^>]*id="target".*>.*<\/div/si
> >
> > That finds my starting & closing tags, but if I've got multiple divs it
> > finds everything up to & including the final </div on the page.
> >
> > Therefore as my next step I need to know how to exclude "</div" from the
> > innerHTML of the div. I've tried (.*(<\/div){0}) but it doesn't seem to work.
> >
> > 1) How do I say "allow any number of any characters but don't allow this
> > substring"?
> >
> > 2) The direction I'm headed is to be able to include all nested divs in my
> > target div. In other words, the range of selected text should include an
> > even number of start & end tags of the same tagName as my target tag:
> >
> > <div id="target">
> > <div>blah he blah</div>
> > <div>blah he blah
> > <div>blah he blah</div>
> > </div>
> > </div>
> >
> > I figure that once I solve problem 1) I'll be able to assemble a regular
> > expression that allows nested tags (<div...>...</div) at least to some
> > reasonable level of nesting. Any suggestions?
> >
> > Thanks,
> > Paul
> >
> >
> > _______________________________________________
> > Javascript mailing list
> > Javascript at LaTech.edu
> > https://lists.LaTech.edu/mailman/listinfo/javascript
> >
>
>
> --
> Voicemail any time at:
> 206-666-MILO
>
--
Voicemail any time at:
206-666-MILO
More information about the Javascript
mailing list