[Javascript] Re: Charset (sort-of OFF_TOPIC for js)

tedd tedd at sperling.com
Mon Aug 14 09:34:54 CDT 2006


At 2:47 PM +0100 8/14/06, David Dorward wrote:
>On Mon, Aug 14, 2006 at 09:39:16AM -0400, tedd wrote:
>
>>  However, the whole situation seems a bit redundant, doesn't it? You
>  > have to both save your document in the right charset encoding AND
>>  then again restate the charset encoding in the document. Why twice?
>
>First, it should be pointed out that sticking the details of the
>encoding in a <meta> element and expecting the client to read that is
>a nasty hack (which comes with a bunch of provisos).

So using a meta tag for encoding is a bad thing?

Please understand that I'm just trying to figure out what's good and 
how to do it.

---

>The proper place for this information is in the HTTP header sent by
>the server before the document itself.
>
>That way the server says "Hello! Here is a UTF-8 document" and then
>the browser can read it knowing it is UTF-8.

Okay, so how does one do that?

Let's make this simple for me. I'm preparing a web page -- what do I 
do? Where is it that I can make the HTTP header do anything? Do I use 
a .htacess file, click my heels together three times, or what?

---

>  > Is it that some browsers can detect the encoding and others can't?
>
>Detecting the encoding comes down to guesswork. You get situations
>such as:
>
>   "The boy ate a chip"
>
>Which, in a document authored by an American would probably mean "The
>boy are a crisp." in British English. However, if it was authored by a
>Brit, then in American English it would probably mean "The boy are a
>french fry."
>
>The same byte can mean different things in different encodings,
>without making either encoding obviously wrong.

I understand there are differences in language diction, that goes 
without saying -- and I don't think that's the problem.

The encoding problem/solution is simply one of expanding the Internet 
from the limited assortment of characters provided by 7-bit ASCII to 
8-bit (and higher) Unicode code points. It's not the language, but 
rather the expanded number of code points that are made available for 
use in web documents. Right?

All I'm trying to do here is to sort-out *what* should be done and 
*where* it should be done (i.e., document encoding, meta-tag, HTTP 
headers, carrier pigeon, or what combination thereof).

I would like to think that this very basic thing shouldn't be this elusive.

Thanks for your time.

tedd
-- 
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com



More information about the Javascript mailing list