[thelist] tip: Persuading browsers to send UTF-8

Andrew Chadwick andrew.chadwick at prnewswire.co.uk
Tue Sep 18 05:33:15 CDT 2001

My last post was too long, so here's a tip to try and make up for it.

<tip audience="cgi-heads, html-folks, all" subject="UTF-8">

If your CGI application would like its form input in UTF-8 (because
that's what needs to go in the database, say), there are one or two
things you can do to make your life easier:

 * Make sure the document you're serving up is served as UTF-8 by the
   web server. In Perl, that's "print $cgi->header(-charset=>"utf-8");",

      Content-Type: text/html; charset=utf-8

   For you NPH guys or people who like hacking Apache config
   files. This persuades most modern browsers to send back UTF-8.

 * Say '<form accept-charset="utf-8">' for your form open tag. Now
   every HTML4+-compliant browser should send back UTF-8.

The above is good, but old browsers are still going to send their
native character set (and not tell you about it in any reliable or
even vaguely portable way). You'll need a more devious technique based
on looking at hidden form fields. Decide what the incoming data is
encoded as, and then use a module like Unicode::MapUTF8 to remap it to


Andrew Chadwick, UNIX/Internet Programmer, PR Newswire Europe, Oxford
The views or opinions above are solely mine and are not necessarily those
of PR Newswire Europe. The message may contain privileged or confidential
information; if you are not a named recipient, notify me, and do not copy,
use, or disclose this message. <andrew.chadwick at prnewswire.co.uk>.

More information about the thelist mailing list