[thelist] ASP.NET Character Encoding

Chris at globet.com Chris at globet.com
Fri Oct 21 09:05:35 CDT 2005


Volkan

> Generally localization issues are hard to solve, cuz they are 
> combinations of several components and are mostly 
> platform/machine/region specific.
> 
> My guess; the issue is to do with your web.config file as 
> well as aspx page encoding atttribute and charset meta tags.

The web config file is set to use UTF-8 for responseEncoding and
requestEncoding. I cannot alter this, because this is what we require
moving forward. What I am trying to achieve is support for data in the
database that is produced by another system. Thus, I cannot change the
encoding of the text in the database, and I cannot change the web.config
file.

I have been considering the matter of encoding for a few days now. I
wonder if you have a few minutes you could comment on the following. If
I have grasped the concepts correctly, then I should be well on the way
to solving the problem. If not, I need to think about it more:) For the
purposes of this example, I will consider a single character held in a
database.

1. The character in the database has been encoded using a given
character set. This means that the bits that make up the character can
be interpreted as that character by any software that knows which
character set to interpret the character as being a member of.

2. The .NET application retrieves the character from the database. It is
using the UTF-8 character set to interpret the character. Since the
character was not encoded using UTF-8, the meaning of the character is
lost. At the bit level, the character however remains unchanged.

3. Assume UTF-8 supports the use of the character. It is required that
.NET understands which character set the character was encoded for. The
bits that constitute the character are extracted. Since .NET knows which
character set was used to encode the character, it now knows what the
character *should* be. It can now create the UTF-8 encoding for that
character.

4. The character is output to the page and the page is sent to the
browser. The html page encoding is set to UTF-8.

I found the following function (C#) online which seems to do some of
what I want. Do you think I'm on the right track?

public static string iso8859_unicode(string src) 
{
	Encoding iso = Encoding.GetEncoding("iso8859-1");
	Encoding unicode = Encoding.UTF8;
	byte[] isoBytes = iso.GetBytes(src);
	return unicode.GetString(isoBytes);
}

Thanks a lot for all of your help with this!

Chris Marsh
Web Developer
http://www.globet.com/
Tel: +44 20 8246 4804 Ext 828
Fax: +44 20 8246 4808

Any opinions expressed in this email are those of the individual and not
necessarily the Company. This message is intended for the use of the
individual or entity to which it is addressed and may contain
information that is confidential and privileged and exempt from
disclosure under applicable law. If the reader of this message is not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication is strictly prohibited.
If you have received this communication in error, please contact the
sender immediately and delete it from your system.  




More information about the thelist mailing list