[thelist] Multi-lingual form field checking
Ben Gustafson
Ben_Gustafson at lionbridge.com
Thu Jan 8 14:34:44 CST 2004
Hi Diane,
escape() and unescape() are the JavaScript functions for toggling between a character's URL-encoded and -unencoded states. See http://www.dolcevie.com/js/js_encode.html for a nice demo.
--Ben
> Great, thank you Ben. I just didn't know what characters are part of
> those languages.
>
> Now, I have another problem...
>
> First of all, the technology being used is only static HTML
> with Apache
> SSI. No chance to change that at all.
>
> I'm using javascript to extract the form variables from the URL and
> pre-populate the form. So, the user puts their query in the form,
> submits to a CGI script (that I can't edit), and on the results page
> their query is pre-populated in the form so they can search
> again. The
> problem is that if a special character is entered, the URL encodes it
> and you end up with some pretty ugly stuff in the form. I put in a
> Polish word and got back a lot of junk. It looks like HTML entities.
>
> What can I do with just javascript about decoding these entities so I
> can put them in as the original pretty characters in the form? Am I
> going to have to get some kind of chart of HTML entities for these
> non-English characters and do some conversions? Or is there a simpler
> way?
>
> Diane
>
> On Wednesday, January 7, 2004, at 11:07 PM,
> thelist-request at lists.evolt.org wrote:
>
> > Hi Diane,
> >
> > I did some random typing in of Alt-key-plus-number-pad
> combinations to
> > guesstimate the range of characters you should include in
> your regexp
> > to
> > check for word characters in the languages you list below. It looks
> > like
> > the extended characters all live in the 128 to 165 range
> (for example,
> > Alt+128 is Ç, and Alt+165 is Ñ). So if you check for =
> > [a-zA-ZÇ-Ñ0-9] in your regexp, you should be covered.
> >
> > --Ben
More information about the thelist
mailing list