[thelist] UTF-8/FormMail/PHP headaches

Peter Johansson peter at johansson.org
Fri May 24 01:02:01 CDT 2002


On Thu, 23 May 2002, Andrew Clover wrote:

> UTF-8 encodes all extended characters as a character code between 0xC0 and
> 0xFF followed by a number of characters in the range 0x80 to 0xBF (the
> number depends on the first character). So a simple check is to see if there
> are any 0xC0-0xFF characters not followed by 0x80-0xBF, or any 0x80-0xBF
> characters not preceded by an 0xC0-0xFF. In either case you know you don't
> have UTF-8 and you can try a different encoding.

Thanks alot for this thorough explanation Andrew, it really shed some
light on the problem for me. Too bad though that the browsers don't send
the charset as supposed, it would make life so much easier when using
scripts like this on multilingual sites (and on other sites as well
of course).

But thanks to your post I should be able to tackle the problem in a
somewhat more elegant way.

Regards,
Peter




More information about the thelist mailing list