[fwd] Re: [thelist] Building a CMS which handles the Chinese language

Sub sub at shanx.com
Fri May 18 21:55:46 CDT 2001

This message was held by thelist software and
is being manually forwarded by a list admin.
Please remember to send emails in plain-text
format only, or they will not reach thelist
until it is later forwarded by a list admin.

Hi Isaac

You are right, its Big5 and GB32.

I might stand corrected, but chinese, japanese, korean etc as you mention
should be a data or a presentation issue - not something that the CMS
to "handle/manage" - all it needs to do is to serve as an intermediary for
the data (take it from here, put it there).

Maybe I did not get your question in all its depth, but in general, it is
question of where you save your data - if its a database, your data should
be OK if it is put in as it is (in some cases, such as Oracle, you might
have to change the NLS characterset in the parameters but these issues are
well-documented) and if it is XML or other some such mechanism your
CMS should work just fine because XML or no XML its is a flat file!

Second part of the story - will the end-user see the data displayed
correctly? Thats an issue from the presentation side, i.e., your HTML
should be properly formatted to handle it:

   EN:    <meta http-equiv=3D"Content-Type" content=3D"text/html;
   CN:   <meta http-equiv=3D"Content-Type" content=3D"text/html;=
   JP:    <meta http-equiv=3D"Content-Type" content=3D"text/html;


AFAIK Perl, PHP, ASP & Cold Fusion do not need anything special to handle
this, but you might need to specify the charset in your JSPs because they
are compiled at runtime.

 From the content management side, if your users are entering double-byte
text (chinese, japanese etc) I am sure they should have the language input
mechanisms in place on their computers already as well as the "encoding"
set correctly in their browsers...so they should be able to enter content
without the CMS "handling" it. All the CMS has to do (usually*) is to grab
the text and put it where it needs to be put (DB, XML repository, flat
files etc).

* With JSP (sigh!) you might end up doing some character set translations
in the backend before putting in text - only with some application
not all -  but I have not encountered this problem with Perl/PHP/ASP/CF.

In sum, I guess it really depends on what platform you are based on. If it
is Java, yes, there is a good deal of work to do the smallest thing - but
that's another story for another day...

Hope I helped! Would love to hear others' experiences..


At 11:31 AM 5/17/2001 +0930, you wrote:

>Is there anyone here with much experience in building Web sites using
>alternative character sets (ie, Chinese =3D Big5, etc)? Specifically, I
>looking at adding multiple-language versions of content to a CMS. I have
>read that there are two character sets (Big5 and GBsomething) that handle
>the Traditional and Simplified character forms. Aside from that, I've not
>been able to find any more details.
>I suspect that languages like French, German, etc will interact with
>databases/form elements easily enough because (AFAIK) they use the same
>characters. What other languages will play happily in this fashion?
>How does it work with other languages? Can you copy and paste Chinese
>characters into form elements? Can you insert/select from the database?
>Viewing source of a page which shows Chinese characters, gives strange
>characters from a standard character set. If I view those in a specific
>font, would they display correctly? Here's an example of the "strange"
>characters (I imagine that they're just extended/non-standard characters
>from the standard character set - is that correct?).
=C1=A6=BD=D2=B5=D7 =D2=C1=C0=CA=B9=DA=BE=FC=CB=F8=B6=A8=BD=F0=B1=AD
>(Having searched thelist archives, the only thread about Chinese Web
>spoke of the "solution" being to use GIF text displaying the characters.)
>I would greatly appreciate any advice/examples/tutorials, etc.

More information about the thelist mailing list