[thelist] What a "Rush!" - Russian encodings

Evgeni Sergeev evgeni_sergeev at hotmail.com
Tue Jan 21 22:08:00 CST 2003

Russian encoding was always a big problem.

When russian text is presented online, it is customarily given in several
encodings. This is necessary, because each major platform has its own
encoding, one for windows, one for DOS, one for unix, etc. Why? -- it has to
do with how the technology came to Russia. The corporate giants, as it often
happens, came before the Official International Standards(yeah yeah, tm).

Ok, here's a small list:

KOI8-r : this was one of the first. Unix systems
   (note, koi8-u is for ukranian characters)
CP1251 (Code Page 1251) : Windows
CP866 : DOS and OS/2
ISO-8859-5 : standard
I think Unicode uses the same order as ISO, but in different positions.

I would advise to use the windows-1251 and ISO and KOI. And Unicode, but
this may be unsupported for older systems (there is a lot of those in
Russia). Specify the encoding in the META, Content-Type, charset. How to do

You may want to have a number of different versions of a page in different
encodings, or have a script translate the same piece of text eg. ISO encoded
text to other encodings. I found a sourceforge project for this kind of
translation. Don't know anything about it, but it might be good:

If you want to work with Unicode, read their pages:

More information on the different character encodings (with tables and
images) is here:

Else try searching for eg. "koi8 iso-8859-5 1251" using your favourite
search engine.

Encoding is one of those things that should be straightforward, but is not.
It would be nice to use the time wasted on multiple encodings for design and
content. You westeners don't have this problem. Welcome to Russia.

  -- Evgeni Sergeev

Tired of spam? Get advanced junk mail protection with MSN 8.

More information about the thelist mailing list