[thelist] Multi-lingual form field checking

Ben Gustafson Ben_Gustafson at lionbridge.com
Thu Jan 8 14:34:44 CST 2004


Hi Diane,

escape() and unescape() are the JavaScript functions for toggling between a character's URL-encoded and -unencoded states. See http://www.dolcevie.com/js/js_encode.html for a nice demo.

--Ben

> Great, thank you Ben. I just didn't know what characters are part of 
> those languages.
> 
> Now, I have another problem...
> 
> First of all, the technology being used is only static HTML 
> with Apache 
> SSI. No chance to change that at all.
> 
> I'm using javascript to extract the form variables from the URL and 
> pre-populate the form. So, the user puts their query in the form, 
> submits to a CGI script (that I can't edit), and on the results page 
> their query is pre-populated in the form so they can search 
> again. The 
> problem is that if a special character is entered, the URL encodes it 
> and you end up with some pretty ugly stuff in the form. I put in a 
> Polish word and got back a lot of junk. It looks like HTML entities.
> 
> What can I do with just javascript about decoding these entities so I 
> can put them in as the original pretty characters in the form?  Am I 
> going to have to get some kind of chart of HTML entities for these 
> non-English characters and do some conversions? Or is there a simpler 
> way?
> 
> Diane
> 
> On Wednesday, January 7, 2004, at 11:07 PM, 
> thelist-request at lists.evolt.org wrote:
> 
> > Hi Diane,
> >
> > I did some random typing in of Alt-key-plus-number-pad 
> combinations to
> > guesstimate the range of characters you should include in 
> your regexp 
> > to 
> > check for word characters in the languages you list below. It looks 
> > like 
> > the extended characters all live in the 128 to 165 range 
> (for example, 
> > Alt+128 is Ç, and Alt+165 is Ñ). So if you check for =
> > [a-zA-ZÇ-Ñ0-9] in your regexp, you should be covered.
> >
> > --Ben


More information about the thelist mailing list