[thelist] Unicode

Alan Wood alan.wood at context.co.uk
Thu Aug 1 03:17:01 CDT 2002


Joel Konkle-Parker wrote:

> I'd like to start writing my pages in Unicode, but I'm not sure I know
> how.
>
> First of all, my editor (jedit), has two options for encoding: 'Unicode'
> and
> 'UTF-8'. I think I know that utf-8 is a compressed subset of unicode.
> Anyway, I
> have the following page saved as UTF-8:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
>   <head>
>     <title>something</title>
>   </head>
>   <body>
>     &#x0C21;
>   </body>
> </html>
>
> When I load that in Mozilla though, with character encoding set to UTF-8,
> all
> that comes up is a "?". That code is supposed to bring up some weird
> african
> symbol, if the unicode code charts are telling the truth.
>
> What am I doing wrong? Also, what's the difference between unicode and
> utf-8?
> Which should I use? How you do specify a unicode encoding? more questions
> of the same type, i'm sure you can anticipate...
>
I don't think you are doing anything wrong.  You probably don't have a font
installed that includes Telugu (an Indian language, not African).  Mozilla
and IE 5/6 for Windows should be able to display Telugu if you have a
suitable font

My test page at:

http://www.alanwood.net/unicode/telugu.html

lists 4 fonts that should enable you to display Telugu.

UTF-8 is one specific type of encoding for Unicode (all, not a subset), and
is the normal Unicode encoding for Web pages.  In this case, "Unicode" is
probably used to mean UTF-16, an encoding that is not normally used for Web
pages.

Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)




More information about the thelist mailing list