[thelist] encoding question: utf8 or big5 for chinese?

Ben Gustafson Ben_Gustafson at lionbridge.com
Fri Mar 8 08:48:01 CST 2002


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--
[ Picked text/plain from multipart/alternative ]

> we are considering buying a software. it will be used
> by completely chinese users who do not speak english,
> and the data will be entirely in chinese. usually, at
> least on a browser, the charset encoding we use is
> big5.
>
> the software vendor tells me that they support "UTF8"
> but not big5. can someone tell me what this means?
>
> i mean, i can imagine that there are two things: the
> GUI of a software and the data. the data should be ok
> because we are sure that we will use oracle and oracle
> can support chinese content. but what is UTF8 support?
>
> i know i can ask the vendor for what they mean, but i
> need to appear intelligent. i searched google for UTF8
> and did not much understand from utf8.com what it is
> really about.
>
> thanks/erick

Erick,

UTF-8 is a "flavor," or encoding, of Unicode. There are other encodings,
such as UCS-2 (the internal Unicode encoding in SQL Server). The major
benefit of using Unicode as your character encoding is that it supports a
very wide range of written languages, including Chinese Traditional and
about any other written language you can think of. For software vendors,
this means that they don't have to support all the various "native"
codepages (such as Big5), but in effect can support all of them by
supporting Unicode.

UTF-8 is commonly used for multilingual Web applications (In fact, if you
try using another Unicode encoding for a browser, you'll probably get a
bunch of garbage characters.) If you go to www.lionbridge.com and view the
site in any of its 10 languages, you'll see the same charset meta tag in the
source:

	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

Although you mention that your product will be only in Chinese, one of the
cool things about Unicode-enabled Web sites is that you can display a number
of different languages, including Asian languages, on the same page. If you
use a native codepage, such as Big5, you can't display another Asian
language on that page, and the languages that use a Latin alphabet have that
ugly single-byte-characters-in-a-double-byte-codepage look.

To my knowledge, Oracle supports Unicode, but I'm not sure which encoding. I
imagine your GUI and data encodings will need to match, so you may need to
convert your data encoding to that of the GUI encoding. That's a question
you can ask your vendor (and appear intelligent while doing so).

Now you can go wade through the stuff in http://www.imc.org/rfc2279 ;).

--Ben

________________
Ben Gustafson
Webmaster
Lionbridge Technologies, Inc.
www.lionbridge.com




More information about the thelist mailing list