<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META content="MSHTML 5.50.4522.1801" name=GENERATOR></HEAD>
<BODY>
<DIV>
<P><FONT face=Arial size=2>I have a rather tricky problem. I need to be able to
take any given HTML-string, parse it and split it up into smaller sections,
depending on the tag nesting.</FONT></P>
<P><FONT face=Arial><FONT size=2>Example tag:
<STRONG>Beforetags<div first><div second></div><div
third><hr></div></div>Aftertags</STRONG></FONT></FONT></P>
<P><FONT face=Arial size=2>Note how the tags i nested: One outer div with
two consecutive div:s inside of it. Inside the second inner div, there's a
<hr>. I want to match all the innermost and ignore the outer one for the
moment.</FONT></P>
<P><FONT face=Arial size=2>The point is this: if I can match all
the innermost tags I can remove them and then run the remaining
string through the regexp and get out the next set of tags <STRONG>until the
string is completely broken up into matching tags</STRONG>. Therefore I only
need to match the innermost tags.</FONT></P>
<P><FONT face=Arial size=2>Solution as far as of now:<BR>var regexp=
/<STRONG><(\w+)[^>]*></STRONG>[^<]*<STRONG><\/\1></STRONG>/ig
(bold for clarity)</FONT></P>
<P><FONT face=Arial size=2>first bold part: <STRONG>match any beginning of
tag</STRONG>: ------starts with "<", then one or more letters followed
by zero or more characters which isn't a ">", then a
">"-------<BR>middle part: <STRONG>any character which isn't a
"<"</STRONG><BR>finsihing part: <STRONG>matching closing tag</STRONG>
---------start with "</", then the character combination which made up
the starting tag, then a ">"</FONT></P>
<P><FONT face=Arial size=2>This works fine as long as the string is made up of
matching tag pairs, but breaks down whenever a nonclosed tag is used, in the
example this is the <hr>, the expression can't match the outer <SPAN
class=671062116-21052001>div </SPAN>tag, that's correct. It matches the first
inner div, which also is correct <EM>but encounters problems when it tries to
match the second inner div</EM>. Since there's a <hr> inside the div, it
can't be matched correctly and just skips over this tag.</FONT></P>
<P><FONT face=Arial size=2>What i need:<BR>Somehow I need to replace the middle
part of the expression with something that says "as long as I don't encounter a
<STRONG>closing tag</STRONG>, keep on testing" instead of "as long as i don't
encounter a <STRONG><</STRONG>, keep on testing."</FONT></P>
<P><FONT face=Arial size=2>I've tried something like
<STRONG>[^(</)]*</STRONG> for the middle part, and then adjust the
third part of the expression accordingly, but no good result. What I want the
preceding regexp to say is "as long as i dont encounter a </, keep on
testing", but I can't get the (<EM>pattern</EM>) to work like
this: [^(<EM>pattern</EM>)]* (zero or more not equal to
pattern).</FONT></P>
<P><FONT face=Arial size=2>I'm sorry about my long question but I hope someone
has a good grasp one pattern matching to solve this one. It's critical to me and
I would be most grateful.</FONT></P>
<P><FONT face=Arial size=2>/Cloak</FONT></P></DIV></BODY></HTML>