[thelist] CF: Regex Help [Solved] + Tip
Frank
lists at frankmarion.com
Fri Jul 18 03:17:26 CDT 2003
At 02:41 PM 2003-07-18 +1000, you wrote:
>> The original question is, how do I edit my regex to match the pattern
>> that does not end with \r or \n?
> Maybe this is answering the wrong question - But why can't you use a '$'
to show
> the end of the line?
PRELUDE
----------------------------------------------------------------------
I busted by skull on this one, and in the end, it turns out that the
problem is a bug
in Studio. Entering the following worked. I failed to make the distinction
between Studio's handling of regex, and the server's.
#REFind("([\.a-zA-Z0-9_-]+)([a-z][^\.]{2})\.(co$)", "abcd.co")#
The following dissection of a regex may be useful to someone new with
regex. Seems like a waste to toss it now. And now back to our regularly(!)
scheduled program.
----------------------------------------------------------------------
Let me see if I can clarify the problem at hand.
This regular expression pattern is meant to allow me to correct a common
typo: typing ".co" where ".com" is intended (example: "hotmail.co"). But I
don't want to match any domain ending with "co" as in
"domain.uk.co". What I do want to match is "hotmail.co", as an example.
This is my current pattern
([\.a-zA-Z0-9_-]+)([a-z]{2}[^\.])\.(co[^m\.])
[1] ([\.a-zA-Z0-9_-]+) Match any number these characters
- gets almost anything (including a dot; ie; "hotmail.")
[2] ([a-z]{2}[^\.]) Match any two alpha chars, match all except a dot*
- fails if "co.uk" exists because "co" is followed with a dot
(before "uk")
[3] \. And a dot
- the dot before the next pattern (as in DOT com)
[4] (co[^m\.]) And "co" followed by anything other than "m" or dot
- "co" AND (NOT ("m" OR dot)) (\r and \n are neither)
fred.domain.uk.co <-- will fail the match because of [2]
hotmail.com <- fails because "co" is followed by "m" or a dot [4]
hot2.co <- matches, because "co" is followed by a \r (not an M)
hot3.co <- fails because there is no char after "co", not even \r
My problem is in finding a pattern that matches. [4] *forces* a match. If
"hot2.co" is followed by \r, it's a match because the pattern ["co" AND
one character other than "m"] exists. The pure "hot3.co" not followed by
ANY character fails to match.
[^m\.] makes it so that the pattern does not exist until such time as there
is any character that is not an "m" or "."
Alternatively, using [^\r\n] means "match any character that IS (not \r\n)"
as opposed to "DON'T match \r\n".
----------------------------------------------------------------------
Tangent:
1) The problem with using $ is that CF DOES NOT interpret $ as the end of a
line (\r or \n or null), but only null - the end of the *document*.
2) Take this list
domain.uk.co
hotmail.com
smith.co
abcd.co <- "o" is the very last char of the document.
The search for ([\.a-zA-Z0-9_-]+)([a-z]{2}[^\.])\.(co[^m\.]) will return
only "smith.co". 8 characters. Studio returns "smith.co\r" 9 characters.
Unless someone can explain where I may be missing something. I'll remain
convinced that these two items are bugs.
----------------------------------------------------------------------
Regex: infinitely useful, frequently a bitch. I'm saving up all my hard won
regexes and will add them to my site, one day.
* NOTE: Any TLD shorter than four chars will fail because of the
combination of [1] and [2].
<tip type="Cold Fusion" author="Frank Marion">
Having a hard time getting your regular expression to work? Test it both in
Studio's advanced "Find" mode, as well as in a regular function. Though
exceedingly similar, ColdFusion Studio's and Cold Fusion server's
implementations have some minor variations--enough to get in your way.
</tip>
--
Frank Marion lists at frankmarion.com Keep the signal high.
More information about the thelist
mailing list