[thelist] embarrassingly stupid XHTML questions

Andrew Chadwick andrew.chadwick at prnewswire.co.uk
Tue Sep 18 04:35:22 CDT 2001


On Mon, Sep 17, 2001 at 02:17:45PM -0500, Green, Janet wrote:
[1 and 2 have already been answered]
> 3. What are unicode and UTF-8, and how do I know what I am using? I've never
> made a conscious decision on this. 
> 
> Please post answers in "for dummies" lingo. Thanks. 


Unicode is the name of a huge international character set -
essentially a really big alphabet with letters, numbers and symbols
drawn from just about every human language in the world. You need a
lot of characters for that, so Unicode gives you 2,147,483,648
possibilities (wow).

UTF-8 is just one of the ways you can encode[1] a Unicode document,
and it's what XML (and HTML) documents use by default.


If your document is mainly written in English and/or other Europaen
languages, and you're not fussed about accent symbols don't worry
about UTF-8. UTF-8 is compatible with the characters you can see on a
US keyboard (except the Euro symbol, if you have one there).

If your documents are written in a non-Roman alphabet like Greek,
Hebrew, Cyrillic, or Korean[2], or you mix them in with English or
other languages, then you probably want to consider UTF-8 as an
encoding (especially if you're mixing scripts together).


You can learn far, far more theory than you or I will ever need to
know at <http://czyborra.com/>, and there's a slightly more user-
friendly FAQ at <http://www.cl.cam.ac.uk/~mgk25/unicode.html>. They're
both geared towards Unix machines and crazy Linux commies like me, but
they're useful from a non-denominational perspective too.


[1] Save to disk in a way the computer can understand next time you
    load it in.

[2] Not really an alphabet but, um, never mind :)

-- 
Andrew Chadwick, UNIX/Internet Programmer, PR Newswire Europe, Oxford
--
The views or opinions above are solely mine and are not necessarily those
of PR Newswire Europe. The message may contain privileged or confidential
information; if you are not a named recipient, notify me, and do not copy,
use, or disclose this message. <andrew.chadwick at prnewswire.co.uk>.




More information about the thelist mailing list