[thelist] Multi-lingual form field checking

Ben Gustafson Ben_Gustafson at lionbridge.com
Wed Jan 7 09:17:39 CST 2004


Hi Diane,

I did some random typing in of Alt-key-plus-number-pad combinations to guesstimate the range of characters you should include in your regexp to check for word characters in the languages you list below. It looks like the extended characters all live in the 128 to 165 range (for example, Alt+128 is Ç, and Alt+165 is Ñ). So if you check for [a-zA-ZÇ-Ñ0-9] in your regexp, you should be covered.

--Ben

> I've got a javascript that uses regular expressions to check 
> the syntax 
> of a form field. To check for the presence of a word, it is 
> looking for 
> [a-z0-9] characters. That's probably not enough characters even for 
> English, but now that at least I'm pretty sure that the regexp works, 
> I'm looking now to expand it to work for non-English languages. It 
> looks like I will have to apply this to the following languages:
> 
> French, German, Dutch, Portugese, Italian, and Spanish.
> 
> What characters should I test for? Should I just change it to 
> consider 
> any non-space character to characterize a word? That's probably the 
> best thing to do for English, but will Javascript blow up if it 
> encounters non-English letters with accents and things no matter what 
> I'm testing for? Will browsers like IE drop in the "wrong" characters 
> like it tends to do when it inserts "smart" quotes into form fields?
> 
> Any help is greatly appreciated.
> 
> Thanks,
> Diane
> 


More information about the thelist mailing list