[Javascript] regexp - how to exclude a substring?
Paul Novitski
paul at novitskisoftware.com
Mon May 23 12:16:10 CDT 2005
Shawn,
Thanks very much for taking the time to write that code. I think it's
still a bit too primitive to do its job, as most of the HTML code I tested
gave funky results, usually breaking out at your 10-iteration limit or
reporting the length of the entire string even when the initial div is
closed early on (try "<div></div><div></div>"). But I think I recognize
one of your contributions: you're suggesting that once you locate an
opening <div and a closing </div, if the intervening string contains
another opening <div then you haven't yet found your initial tag's
close. Of course, in practice you'd actually need to count the number of
opening <div's and subtracting the number of closing </div's to recognize
the initial tag's close when you came to it, otherwise your algorithm would
fail with multiply nested divs.
I'm sorry that I wasn't more clear, but HTML parsing logic per se really
isn't my problem. I love parsing text, have been doing it for years, and I
can totally walk through HTML using baby steps of indexOf() or regex, to
validate structure and/or locate a target.
What I was hoping for was a single, clever regular expression that would do
a lot of the low-level parsing for me in one neat statement. I've been
hoping that because one can exclude a character (for example [^<] to
exclude '<') that one might be able to exclude an entire substring like
'</div'. It appears that you don't know of a way to do that, which is
disheartening, and I'll probably fall back on ordinary walk-through parsing
as usual, using RegEx to execute the incremental searches along the way but
still having to take each step in my code.
Splitting the HTML into an array on "<" is the most brilliant innovation
I've come up with so far, and I'm hoping to squeeze just a bit more blood
from the old stone yet...
Maybe you can answer a more general question I have about regular
expressions: why, when you search for
<div.*<\/div
does regexp return a string that stretches all the way to the last </div
found and not simply to the first one it encounters?
Warm regards,
Paul
At 09:18 AM 5/23/2005, Shawn Milo wrote:
>Paul,
>
>I believe I have something that fulfills the requirements, although it
>seems to freeze
>my browser when I let it go through too many iterations.
>
>Take a look, and let me know what you think. I think there's a
>fundemental flaw somewhere, but I think we can find it as a group.
>
>Everyone who takes the time to look at it: Feedback please! I know
>there's some good stuff and some boo-boos in there, but I need help in
>telling the one from the other. ;o)
>
>Shawn
>
>
>
>
>_______________________________________________
>Javascript mailing list
>Javascript at LaTech.edu
>https://lists.LaTech.edu/mailman/listinfo/javascript
More information about the Javascript
mailing list