[thelist] character encoding & validation

Andrew Clover and at doxdesk.com
Fri Jul 26 12:44:01 CDT 2002


Duncan O'Neill <dbaxo at ihug.co.nz> wrote:

> The validator is complaining about the character-
> encoding. The server is spitting it out by default as
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-16">

That's no good. Setting Content-Type through a meta http-equiv hack
can only work if the browser can read the <meta> tag itself. In most
character sets (utf-8, iso-8859-1, shift_jis, etc.) that's okay, because
they are supersets of us-ascii, so characters like '<', 'm', 'e' and
so on are encoded in exactly the same way.

This is not the case for UTF-16, which is a fixed-width double-byte encoding.
a '<' would be represented by a '<' byte and a zero byte, which would
prevent a browser being able to read it. If you want to use UTF-16 as a
web page encoding, you must specify that *before* the document begins,
by using a real HTTP header instead of a meta hack. In ASP you can do this
with:

  <% Response.ContentType="text/html; charset=utf-16" %>

before the page begins.

HOWEVER. The page whose URL you posted is not actually UTF-16 encoded;
did you mean UTF-8? Any browser that attempts to follow your charset
declaration will display complete gibberish.

--
Andrew Clover
mailto:and at doxdesk.com
http://and.doxdesk.com/



More information about the thelist mailing list