[thelist] Regexs and headaches
Joshua Olson
joshua at waetech.com
Thu Feb 9 07:43:21 CST 2006
> -----Original Message-----
> From: Dan Parry
> Sent: Thursday, February 09, 2006 7:29 AM
> I've successfully got it to locate all opening tags and
> ignore self-closers (eg <br/>). it even picks up tags with
> attributes
>
> But (and this is a big but) it can't find single letter tags
> (eg <b>). it can find single letter tags with attributes though
> (eg <a> href="http://example.org <http://example.org/> ">)
>
> Here is the regex:
>
> /\<[^\/]([^<>]*)[^\/]>/g
Hi Dan,
The trick, I've found, to making a robust regex for matching HTML is to go
back to the RFC and build it in totality. Using ColdFusion's version of
Regex (which uses slightly different tokens than most regex's) I did this a
while back. The resulting regex can be used to find all sorts of variations
on tags.
Take a look here if you are interested:
http://concepts.waetech.com/unclosed_tags/
<><><><><><><><><><>
Joshua L. Olson
WAE Tech Inc.
http://www.waetech.com/
Phone: 706.210.0168
Fax: 413.812.4864
Monitor bandwidth usage on IIS6 in real-time:
http://www.waetech.com/services/iisbm/
More information about the thelist
mailing list