[thelist] [TIP] - Use UTF-8 whenever possible, or get used to extra doses of caffeine.

T. R. Valentine trvalentine at gmail.com
Fri May 12 07:06:16 CDT 2006


On 11/05/06, Chris at globet.com <Chris at globet.com> wrote:

> > Why should not people who have keyboards that enter
> > characters with which they are quite comfortable from a
> > language with which they are comfortable be able to enter
> > those characters to access a URI? What is wrong with these
> > potential domains?
> >    www.özçelik.com
> >    www.περιχώρησις.com
> >    www.православная.com
>
> "Wrong" is abstract in this context. What is impractical about
> these potential domains is that it will be more difficult for
> migrants to access services and information when located in
> English speaking countries; of which there are many. There is a
> difference in practicality between providing correctly encoded (but
> non standard for the environment) output, and *requiring* correctly
> encoded (but non standard for the environment) input.

ISTM that migrants located in English speaking countries who are
computer literate will either have a computer configured to run in the
native language or at least the ability to switch keyboards. Also,
since migrants located in English speaking countries will always be a
minority, it seems unreasonably restrictive to compel the users of a
language to limit their URIs to the characters found in the English
alphabet.


> > Must the Internet be American-centred?
>
> No, why? Using a certain character set has nothing to do with
> geographical boudaries. If you want to reach the broadest cross-
> section of the global population, then standardisation; not
> diversification; should be the goal.

ISTM the standardisation for URIs ought to be UTF. I think it
reasonable to assume that English-only speakers will have no interest
in visiting URIs they cannot even type such as özçelik.com, and if
they are interested they will know how to use an alternate keyboard.


> See this article for a nice primer on encoding:
>
> <http://www.joelonsoftware.com/articles/Unicode.html>

Thanks. It is a good primer.




On 11/05/06, Luther, Ron <Ron.Luther at hp.com> wrote:

> My question dealt with the latter - the non-Roman characters.
>
> Let me provide a little more background.  I work in IntraNet reporting
> for a
> largish global company headquartered in the US.  (If the company were
> headquartered in a country where everyone was familiar with multi-byte
> character sets I agree that I would not expect as much of a problem.
> It is not.)
>
> Our new (and quite enormous by any measure) data warehouse will be
> UTF-8.

I thought UTF-8 was *not* a 'multi-byte character set' in the lower
ranges (which equate with ASCII).


> [Yes, I have been through all of the arguments on 'centralized corporate
>
> reporting needs' versus 'local country reporting needs'. I believe I
> understand them.  In this context I am *only* talking about the
> centralized corporate reporting.]
>
> How do I train users not accustomed to working with diacritical marks on
> their use?

Okay, I understand that would be a problem. Of course, training users
is *always* a problem! ;-)


> How do I get users to understand that searching for a 'keyword' that
> returned
> all records of interest in the past will no longer do so because some
> data records
> will have that keyword spelled with different characters they may not be
> familiar with?
>
> What advice can I give to users who currently extract the ASCII-7 data
> and
> throw it into MS Access for post-processing on the issues they will run
> into
> when that data is suddenly UTF-8?
>
>
> Those are some of the concerns I have in-house.
>
> For worldwide ExtraNet or InterNet applications, (yes - they are
> different things),
> I have fewer concerns about UTF-8 and would, most likely, advocate it's
> use.

Okay. Thanks.


-- 
T. R. Valentine
Use a decent browser: Safari, Firefox, Mozilla, Opera
(Avoid IE like the plague it is)


More information about the thelist mailing list