[thelist] embarrassingly stupid XHTML questions

Andrew Chadwick andrew.chadwick at prnewswire.co.uk
Tue Sep 18 04:35:22 CDT 2001

On Mon, Sep 17, 2001 at 02:17:45PM -0500, Green, Janet wrote:
[1 and 2 have already been answered]
> 3. What are unicode and UTF-8, and how do I know what I am using? I've never
> made a conscious decision on this. 
> Please post answers in "for dummies" lingo. Thanks. 

Unicode is the name of a huge international character set -
essentially a really big alphabet with letters, numbers and symbols
drawn from just about every human language in the world. You need a
lot of characters for that, so Unicode gives you 2,147,483,648
possibilities (wow).

UTF-8 is just one of the ways you can encode[1] a Unicode document,
and it's what XML (and HTML) documents use by default.

If your document is mainly written in English and/or other Europaen
languages, and you're not fussed about accent symbols don't worry
about UTF-8. UTF-8 is compatible with the characters you can see on a
US keyboard (except the Euro symbol, if you have one there).

If your documents are written in a non-Roman alphabet like Greek,
Hebrew, Cyrillic, or Korean[2], or you mix them in with English or
other languages, then you probably want to consider UTF-8 as an
encoding (especially if you're mixing scripts together).

You can learn far, far more theory than you or I will ever need to
know at <http://czyborra.com/>, and there's a slightly more user-
friendly FAQ at <http://www.cl.cam.ac.uk/~mgk25/unicode.html>. They're
both geared towards Unix machines and crazy Linux commies like me, but
they're useful from a non-denominational perspective too.

[1] Save to disk in a way the computer can understand next time you
    load it in.

[2] Not really an alphabet but, um, never mind :)

Andrew Chadwick, UNIX/Internet Programmer, PR Newswire Europe, Oxford
The views or opinions above are solely mine and are not necessarily those
of PR Newswire Europe. The message may contain privileged or confidential
information; if you are not a named recipient, notify me, and do not copy,
use, or disclose this message. <andrew.chadwick at prnewswire.co.uk>.

More information about the thelist mailing list