[thelist] Multi-lingual form field checking

Diane Soini dianesoini at earthlink.net
Thu Jan 8 09:23:52 CST 2004


Great, thank you Ben. I just didn't know what characters are part of 
those languages.

Now, I have another problem...

First of all, the technology being used is only static HTML with Apache 
SSI. No chance to change that at all.

I'm using javascript to extract the form variables from the URL and 
pre-populate the form. So, the user puts their query in the form, 
submits to a CGI script (that I can't edit), and on the results page 
their query is pre-populated in the form so they can search again. The 
problem is that if a special character is entered, the URL encodes it 
and you end up with some pretty ugly stuff in the form. I put in a 
Polish word and got back a lot of junk. It looks like HTML entities.

What can I do with just javascript about decoding these entities so I 
can put them in as the original pretty characters in the form?  Am I 
going to have to get some kind of chart of HTML entities for these 
non-English characters and do some conversions? Or is there a simpler 
way?

Diane

On Wednesday, January 7, 2004, at 11:07 PM, 
thelist-request at lists.evolt.org wrote:

> Hi Diane,
>
> I did some random typing in of Alt-key-plus-number-pad combinations to 
> =
> guesstimate the range of characters you should include in your regexp 
> to =
> check for word characters in the languages you list below. It looks 
> like =
> the extended characters all live in the 128 to 165 range (for example, 
> =
> Alt+128 is =C7, and Alt+165 is =D1). So if you check for =
> [a-zA-Z=C7-=D10-9] in your regexp, you should be covered.
>
> --Ben



More information about the thelist mailing list