[Javascript] regexp - how to exclude a substring?

Paul Novitski paul at novitskisoftware.com
Mon May 23 12:16:10 CDT 2005


Shawn,

Thanks very much for taking the time to write that code.  I think it's 
still a bit too primitive to do its job, as most of the HTML code I tested 
gave funky results, usually breaking out at your 10-iteration limit or 
reporting the length of the entire string even when the initial div is 
closed early on (try "<div></div><div></div>").  But I think I recognize 
one of your contributions: you're suggesting that once you locate an 
opening <div and a closing </div, if the intervening string contains 
another opening <div then you haven't yet found your initial tag's 
close.  Of course, in practice you'd actually need to count the number of 
opening <div's and subtracting the number of closing </div's to recognize 
the initial tag's close when you came to it, otherwise your algorithm would 
fail with multiply nested divs.

I'm sorry that I wasn't more clear, but HTML parsing logic per se really 
isn't my problem.  I love parsing text, have been doing it for years, and I 
can totally walk through HTML using baby steps of indexOf() or regex, to 
validate structure and/or locate a target.

What I was hoping for was a single, clever regular expression that would do 
a lot of the low-level parsing for me in one neat statement.  I've been 
hoping that because one can exclude a character (for example [^<] to 
exclude '<') that one might be able to exclude an entire substring like 
'</div'.  It appears that you don't know of a way to do that, which is 
disheartening, and I'll probably fall back on ordinary walk-through parsing 
as usual, using RegEx to execute the incremental searches along the way but 
still having to take each step in my code.

Splitting the HTML into an array on "<" is the most brilliant innovation 
I've come up with so far, and I'm hoping to squeeze just a bit more blood 
from the old stone yet...

Maybe you can answer a more general question I have about regular 
expressions: why, when you search for
         <div.*<\/div
does regexp return a string that stretches all the way to the last </div 
found and not simply to the first one it encounters?

Warm regards,

Paul



At 09:18 AM 5/23/2005, Shawn Milo wrote:
>Paul,
>
>I believe I have something that fulfills the requirements, although it
>seems to freeze
>my browser when I let it go through too many iterations.
>
>Take a look, and let me know what you think. I think there's a
>fundemental flaw somewhere, but I think we can find it as a group.
>
>Everyone who takes the time to look at it: Feedback please! I know
>there's some good stuff and some boo-boos in there, but I need help in
>telling the one from the other.  ;o)
>
>Shawn
>
>
>
>
>_______________________________________________
>Javascript mailing list
>Javascript at LaTech.edu
>https://lists.LaTech.edu/mailman/listinfo/javascript





More information about the Javascript mailing list