[thelist] ASP.NET Character Encoding

VOLKAN ÖZÇELİK volkan.ozcelik at gmail.com
Sat Oct 22 01:25:34 CDT 2005


>
> The web config file is set to use UTF-8 for responseEncoding and
> requestEncoding. I cannot alter this, because this is what we require
> moving forward. What I am trying to achieve is support for data in the
> database that is produced by another system. Thus, I cannot change the
> encoding of the text in the database, and I cannot change the web.config
> file.

It should not be a problem, as long as you are consistent in the
encoding you use and you know (or can guess by trial and error) which
encoding comes from which source (iso-greek encoded db, utf encoded
request and response etc).

> 1. The character in the database has been encoded using a given
> character set. This means that the bits that make up the character can
> be interpreted as that character by any software that knows which
> character set to interpret the character as being a member of.

Correct, if you know in which charset the data is and you have an
apropriate decoder for that charset then you're done.

>
> 2. The .NET application retrieves the character from the database. It is
> using the UTF-8 character set to interpret the character. Since the
> character was not encoded using UTF-8, the meaning of the character is
> lost. At the bit level, the character however remains unchanged.

Again correct. That's the reversible conversion stuff. If you're
lucky, the conversion of String in charset A into a String in charset
B and then back to the String in charset A will be done without loss.

There are exceptions to this: For instance when I tried to convert
from Turkish to Chinese and back to Turkish again I experienced some
garbage data (which concludes that the conversion process between
Chinese and Turkish is irreversible)

I have not tested but a conversion from/to Latin/Greek encoding and
UTF-8 will most likely be irreversible (Since I've done it with
Turkish and Turkish charset CP1254 or iso-8859-9 is what we call
"extended latin" and since greek charset is a near charset (CP1252 i
suppose) you should not experience much trouble.

>
> 3. Assume UTF-8 supports the use of the character. It is required that
> .NET understands which character set the character was encoded for. The
> bits that constitute the character are extracted. Since .NET knows which
> character set was used to encode the character, it now knows what the
> character *should* be. It can now create the UTF-8 encoding for that
> character.
>
> 4. The character is output to the page and the page is sent to the
> browser. The html page encoding is set to UTF-8.
>
> I found the following function (C#) online which seems to do some of
> what I want. Do you think I'm on the right track?
>
> public static string iso8859_unicode(string src)
> {
>        Encoding iso = Encoding.GetEncoding("iso8859-1");
>        Encoding unicode = Encoding.UTF8;
>        byte[] isoBytes = iso.GetBytes(src);
>        return unicode.GetString(isoBytes);
> }

Again correct. You happen to be on the right track. If you have not
succeeded yet, the rest is mostly up to trial and error: Try
conversions from-to different code pages, charset and hope that one of
them sorts out the problem. If not, increase your combinations, change
the orders of parameters (i.e. permute them) no matter how nonsense it
seems (after seeing it work, you can always generate a logical
explanation :) ).

Here are two methods I use for that matter:

public static string ServerStringToResponseString(string value){
return Encoding.GetEncoding(setting.DatabaseCodePage).GetString(
Encoding.GetEncoding(setting.DataReadCodePage).GetBytes(value));
}

public static string ResponseStringToServerString(string value) {
return Encoding.GetEncoding(setting.DataReadCodePage).GetString(
Encoding.GetEncoding(setting.DatabaseCodePage).GetBytes(value));
}

where "setting" refers to my ApplicationSetting object which I read
global preferences.


HTH,
--
Volkan Ozcelik
+>Yep! I'm blogging! : http://www.volkanozcelik.com/volkanozcelik/blog/
+> My projects/studies/trials/errors : http://www.sarmal.com/



More information about the thelist mailing list