[thelist] RegEx problem again
Kasimir K
evolt at kasimir-k.fi
Mon Aug 8 12:11:42 CDT 2005
Tom Dell'Aringa scribeva in 2005-08-08 15:14:
> - Minimum 3 characters
> - Only alpha characters
> - Can include a wildcard character of '*' (in which case you would
> require 4 characters)
>
> To that end I had this:
>
> var namematch = /^[a-zA-Z]{3,}$/.test(nametext.value);
> namematch = (namematch || /^[a-zA-Z\*]{4,}$/.test(nametext.value));
>
> Which worked just fine, till we realized we need to allow one or more
> SPACES as well! How do you add a space to the pattern? (and allow for
> as many spaces as they like?)
Where will the spaces be allowed? And how many consecutive spaces?
Kowalkowski, Lee (ASPIRE) scribeva in 2005-08-08 15:34:
> Just add a space to the character sets, so [a-zA-Z] becomes [a-zA-Z ].
> Note the space before the closing square bracket.
This would allow names like ' a' and even ' '.
You could use: /^[a-zA-Z][a-zA-Z ]{2,}$/
first require a-zA-Z, then allow spaces too. This would match 'a ' too
though (character followed by two spaces).
How about requiring that a space should always follow a character:
/^[a-zA-Z]([a-zA-Z]*[ ]?){2,}$/
First an a-zA-Z, then at least twice at least one a-zA-Z followed by an
optional space.
Even that would leave a bit to hope for... that would not match
Dell'Aringa, because of the ' character...
How about names like Rättö or Piña? 'ä', 'ö' and 'ñ' are not in between
'a' and 'z'...
You could use \w to match any alphanumeric character, but that has some
problems: it would include numbers and underscores, and it works
differently in different browsers... (put the following in your
browser's address bar: javascript:alert(/[\w]{2,}/.test('ññ')); - in
MSIE you get false, in FF true...)
A bit tricky situation, isn't it.
If you need to match names written with characters other than A-Z and
a-z, you could use Unicode notion: [\U0041-\U005A] matches characters
with code point between hex 41 and 5A (latin capital letter A and Z).
At http://www.unicode.org/Public/UNIDATA/UnicodeData.txt you'll find the
codepoints (among other data :-)
At http://www.unicode.org/Public/UNIDATA/Blocks.txt you'll find the
blocks (Hebrew is between 0590..05FF and Arabic 0600..06FF)
Sorry Tom if this confuses more than helps... but once I started to
think about Dell'Aringa, Rättö and Piña I just couldn't stop... And I
too would love to hear RegEx gurus' views on this.
.k
More information about the thelist
mailing list