[thelist] Regexs and headaches

Dan Parry dan at virtuawebtech.co.uk
Thu Feb 9 08:19:28 CST 2006


Hi

Thanks for all the responses guys... really helpful stuff

I'll definitely be reading more about these tricky (yet amazingly useful)
little blighters :)

It's nice to have some support where people know what you are talking
about... a relatively unknown concept for me :)

Cheers

Dan

-----Original Message-----
From: thelist-bounces at lists.evolt.org
[mailto:thelist-bounces at lists.evolt.org] On Behalf Of Joshua Olson
Sent: 09 February 2006 13:43
To: thelist at lists.evolt.org
Subject: Re: [thelist] Regexs and headaches

> -----Original Message-----
> From: Dan Parry
> Sent: Thursday, February 09, 2006 7:29 AM

> I've successfully got it to locate all opening tags and 
> ignore self-closers (eg <br/>). it even picks up tags with 
> attributes
> 
> But (and this is a big but) it can't find single letter tags 
> (eg <b>). it can find single letter tags with attributes though 
> (eg <a> href="http://example.org <http://example.org/> ">)
>
> Here is the regex:
> 
> /\<[^\/]([^<>]*)[^\/]>/g

Hi Dan,

The trick, I've found, to making a robust regex for matching HTML is to go
back to the RFC and build it in totality.  Using ColdFusion's version of
Regex (which uses slightly different tokens than most regex's) I did this a
while back.  The resulting regex can be used to find all sorts of variations
on tags.  

Take a look here if you are interested:

http://concepts.waetech.com/unclosed_tags/

<><><><><><><><><><>
Joshua L. Olson
WAE Tech Inc.
http://www.waetech.com/
Phone: 706.210.0168 
Fax: 413.812.4864

Monitor bandwidth usage on IIS6 in real-time:
http://www.waetech.com/services/iisbm/


-- 

* * Please support the community that supports you.  * *
http://evolt.org/help_support_evolt/

For unsubscribe and other options, including the Tip Harvester 
and archives of thelist go to: http://lists.evolt.org 
Workers of the Web, evolt ! 




More information about the thelist mailing list