[thelist] non-Roman characters in URIs [was: [TIP] - Use UTF-8 whenever possible]

kasimir-k evolt at kasimir-k.fi
Fri May 12 13:51:25 CDT 2006


T. R. Valentine scribeva in 12/05/2006 17:01:
> So, how does someone with an Arabic or Armenian or Chinese ChaJei --
> or any one of dozens -- keyboard layout enter the non-accented Latin
> characters (000000–00007F of Unicode or ASCII) (besides using Alt+
> codes or copying and pasting)?

Good question! It would be great if list members using such keyboards 
could tell us how it works in the real life. Meanwhile, I had a look at 
http://en.wikipedia.org/wiki/Keyboard_layout
"Also, most non-Roman keyboard layouts have the capacity to be used to 
input Roman letters as well as the script of the language"

I would believe that Roman letters were often printed on these 
keyboards, as they are so commonly needed - but I just guessing here.

> My concern/interest is in having the Internet as international as possible.

You are not alone there, another Wikipedia article worth checking out: 
http://en.wikipedia.org/wiki/Internationalized_domain_names
"Internationalizing Domain Names in Applications (IDNA) is a mechanism 
defined in 2003 for handling internationalized domain names containing 
non-ASCII characters. ... it was decided that non-ASCII domain names 
should be converted to a suitable ASCII-based form by web browsers ..."

See also http://www.icann.org/topics/idn.html

Many are concerned though that domain name internationalization brings 
babelization.

Also, it is good to remember that while users of non-Roman keyboards 
most of the time can type Roman characters too (and they might even be 
printed on the keys), users of Roman keyboards have no means to type 
Chinese or Arabic - so using non-Roman characters (without possibility 
to convert them to Roman characters, which IDNA does provide) would make 
the Internet less international, not more.

.k



More information about the thelist mailing list