[thelist] Regex help

Joshua Olson joshua at waetech.com
Mon Nov 26 13:39:18 CST 2007


> -----Original Message-----
> From: Dan Parry
> Sent: Monday, November 26, 2007 2:19 PM
> 
> It breaks because of the whitespace in the text attribute... 
> Now, it seems
> to me I have 2 options:
> 
> 1) Replace whitespaces in attributes with a different 
> character (not my
> chosen solution)
> 
> 2) Break the attributes apart (into an array with format 
> 'key="Value"') and
> deal with each attribute separately (preferred option)
> 
> For the life of me I'm not able to figure out the regex to do 
> either (which
> is odd because usually regexes are my 'Rainman' syndrome)
> 
> The proposed regex need only focus on the 'attributes' not 
> the 'tag name'


Hi Dan,

The trick is to use a regex that is designed from the ground-up to match the
definition from the RFC.  Here's a bit of information on Regex's for
matching HTML tags in a very robust fashion:

http://concepts.waetech.com/unclosed_tags/

That code is for CF, but the concepts should be consistent for PHP.

Joshua





More information about the thelist mailing list