[thelist] CF: Regex Help [Solved] + Tip

Frank lists at frankmarion.com
Fri Jul 18 03:17:26 CDT 2003


At 02:41 PM 2003-07-18 +1000, you wrote:
 >> The original question is, how do I edit my regex to match the pattern
 >> that  does not end with \r or \n?

 > Maybe this is answering the wrong question - But why can't you use a '$' 
to show
 > the end of the line?

PRELUDE
----------------------------------------------------------------------
I busted by skull on this one, and in the end, it turns out that the 
problem is a bug
in Studio. Entering the following worked. I failed to make the distinction 
between Studio's handling of regex, and the server's.

#REFind("([\.a-zA-Z0-9_-]+)([a-z][^\.]{2})\.(co$)", "abcd.co")#

The following dissection of a regex may be useful to someone new with 
regex. Seems like a waste to toss it now. And now back to our regularly(!) 
scheduled program.
----------------------------------------------------------------------


Let me see if I can clarify the problem at hand.

This regular expression pattern is meant to allow me to correct a common 
typo: typing ".co" where ".com" is intended (example: "hotmail.co"). But I 
don't want to match any domain ending  with "co" as in 
"domain.uk.co".  What I do want to match is "hotmail.co", as an example.

This is my current pattern

       ([\.a-zA-Z0-9_-]+)([a-z]{2}[^\.])\.(co[^m\.])


[1]   ([\.a-zA-Z0-9_-]+)  Match any number these characters
       - gets almost anything (including a dot; ie; "hotmail.")

[2]   ([a-z]{2}[^\.])  Match any two alpha chars, match all except a dot*
       - fails if "co.uk" exists because "co" is followed with a dot 
(before "uk")

[3]   \.  And a dot
       - the dot before the next pattern (as in DOT com)

[4]   (co[^m\.])  And "co" followed by anything other than "m" or dot
       - "co" AND (NOT ("m" OR dot)) (\r and \n are neither)

fred.domain.uk.co  <-- will fail the match because of [2]
hotmail.com   <- fails because "co" is followed by "m" or a dot [4]
hot2.co   <- matches, because "co" is followed by a \r (not an M)
hot3.co <- fails because there is no char after "co", not even \r

My problem is in finding a pattern that matches.   [4] *forces* a match. If 
"hot2.co" is followed by \r,  it's  a match because the pattern ["co" AND 
one character other than "m"] exists. The pure "hot3.co" not followed by 
ANY character fails to match.

[^m\.] makes it so that the pattern does not exist until such time as there 
is any character that is not an "m" or "."

Alternatively, using [^\r\n] means "match any character that IS (not \r\n)" 
as opposed to "DON'T match \r\n".

----------------------------------------------------------------------
Tangent:
1) The problem with using $ is that CF DOES NOT interpret $ as the end of a 
line (\r or \n or null), but only null - the end of the *document*.

2)  Take this list

domain.uk.co
hotmail.com
smith.co
abcd.co <- "o" is the very last char of the document.

The search for ([\.a-zA-Z0-9_-]+)([a-z]{2}[^\.])\.(co[^m\.]) will return 
only "smith.co".  8 characters. Studio returns "smith.co\r" 9 characters.

Unless someone can explain where I may be missing something. I'll remain 
convinced that these two items are bugs.
----------------------------------------------------------------------

Regex: infinitely useful, frequently a bitch. I'm saving up all my hard won 
regexes and will add them to my site, one day.

* NOTE: Any TLD shorter than four chars will fail because of the 
combination of [1] and [2].



<tip type="Cold Fusion" author="Frank Marion">

Having a hard time getting your regular expression to work? Test it both in 
Studio's advanced "Find" mode, as well as in a regular function. Though 
exceedingly similar, ColdFusion Studio's and Cold Fusion server's 
implementations have some minor variations--enough to get in your way.

</tip>



--
Frank Marion     lists at frankmarion.com      Keep the signal high.  



More information about the thelist mailing list