[thelist] [TIP] - Use UTF-8 whenever possible, or get used to extra doses of caffeine.

Luther, Ron Ron.Luther at hp.com
Thu May 11 07:53:49 CDT 2006


T. R. Valentine asked:


>>I'm not sure if Jeroen really means 'special characters' (e.g. @, #,
$, %, ^, etc.) 
>>or non-Roman characters such as those found in Chinese, Cyrillic,
Greek, and even Latin 
>>characters that don't appear on an American-style keyboard.

>>If the former, then I quite agree. But if the latter, I do not
understand the objection.


Hi T.R.,


My question dealt with the latter - the non-Roman characters.

Let me provide a little more background.  I work in IntraNet reporting
for a 
largish global company headquartered in the US.  (If the company were 
headquartered in a country where everyone was familiar with multi-byte 
character sets I agree that I would not expect as much of a problem. 
It is not.)

Our new (and quite enormous by any measure) data warehouse will be
UTF-8.

[Yes, I have been through all of the arguments on 'centralized corporate

reporting needs' versus 'local country reporting needs'. I believe I 
understand them.  In this context I am *only* talking about the
centralized 
corporate reporting.]

How do I train users not accustomed to working with diacritical marks on
their use?

How do I get users to understand that searching for a 'keyword' that
returned 
all records of interest in the past will no longer do so because some
data records 
will have that keyword spelled with different characters they may not be
familiar with?

What advice can I give to users who currently extract the ASCII-7 data
and 
throw it into MS Access for post-processing on the issues they will run
into 
when that data is suddenly UTF-8?


Those are some of the concerns I have in-house.

For worldwide ExtraNet or InterNet applications, (yes - they are
different things), 
I have fewer concerns about UTF-8 and would, most likely, advocate it's
use.


Thanks,

RonL.



More information about the thelist mailing list