[thelist] Unicode
Alan Wood
alan.wood at context.co.uk
Thu Aug 1 03:17:01 CDT 2002
Joel Konkle-Parker wrote:
> I'd like to start writing my pages in Unicode, but I'm not sure I know
> how.
>
> First of all, my editor (jedit), has two options for encoding: 'Unicode'
> and
> 'UTF-8'. I think I know that utf-8 is a compressed subset of unicode.
> Anyway, I
> have the following page saved as UTF-8:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <title>something</title>
> </head>
> <body>
> డ
> </body>
> </html>
>
> When I load that in Mozilla though, with character encoding set to UTF-8,
> all
> that comes up is a "?". That code is supposed to bring up some weird
> african
> symbol, if the unicode code charts are telling the truth.
>
> What am I doing wrong? Also, what's the difference between unicode and
> utf-8?
> Which should I use? How you do specify a unicode encoding? more questions
> of the same type, i'm sure you can anticipate...
>
I don't think you are doing anything wrong. You probably don't have a font
installed that includes Telugu (an Indian language, not African). Mozilla
and IE 5/6 for Windows should be able to display Telugu if you have a
suitable font
My test page at:
http://www.alanwood.net/unicode/telugu.html
lists 4 fonts that should enable you to display Telugu.
UTF-8 is one specific type of encoding for Unicode (all, not a subset), and
is the normal Unicode encoding for Web pages. In this case, "Unicode" is
probably used to mean UTF-16, an encoding that is not normally used for Web
pages.
Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)
More information about the thelist
mailing list