[thelist] ASP.NET Character Encoding

Robert Vreeland vreeland at studioframework.com
Sat Oct 22 11:27:05 CDT 2005

Not sure if this will help, but you should be able to get the collation of
the database columns. Unfortunately, I'm not sure of the sql query to do so.
In any event here is a link to some info found in the MSDN. It may be a good
starting point.


Robert Vreeland  

-----Original Message-----
From: thelist-bounces at lists.evolt.org.uk
[mailto:thelist-bounces at lists.evolt.org.uk] On Behalf Of Chris at globet.com
Sent: Friday, October 21, 2005 10:06 AM
To: thelist at lists.evolt.org
Subject: Re: [thelist] ASP.NET Character Encoding


> Generally localization issues are hard to solve, cuz they are 
> combinations of several components and are mostly 
> platform/machine/region specific.
> My guess; the issue is to do with your web.config file as well as aspx 
> page encoding atttribute and charset meta tags.

The web config file is set to use UTF-8 for responseEncoding and
requestEncoding. I cannot alter this, because this is what we require moving
forward. What I am trying to achieve is support for data in the database
that is produced by another system. Thus, I cannot change the encoding of
the text in the database, and I cannot change the web.config file.

I have been considering the matter of encoding for a few days now. I wonder
if you have a few minutes you could comment on the following. If I have
grasped the concepts correctly, then I should be well on the way to solving
the problem. If not, I need to think about it more:) For the purposes of
this example, I will consider a single character held in a database.

1. The character in the database has been encoded using a given character
set. This means that the bits that make up the character can be interpreted
as that character by any software that knows which character set to
interpret the character as being a member of.

2. The .NET application retrieves the character from the database. It is
using the UTF-8 character set to interpret the character. Since the
character was not encoded using UTF-8, the meaning of the character is lost.
At the bit level, the character however remains unchanged.

3. Assume UTF-8 supports the use of the character. It is required that .NET
understands which character set the character was encoded for. The bits that
constitute the character are extracted. Since .NET knows which character set
was used to encode the character, it now knows what the character *should*
be. It can now create the UTF-8 encoding for that character.

4. The character is output to the page and the page is sent to the browser.
The html page encoding is set to UTF-8.

I found the following function (C#) online which seems to do some of what I
want. Do you think I'm on the right track?

public static string iso8859_unicode(string src) {
	Encoding iso = Encoding.GetEncoding("iso8859-1");
	Encoding unicode = Encoding.UTF8;
	byte[] isoBytes = iso.GetBytes(src);
	return unicode.GetString(isoBytes);

Thanks a lot for all of your help with this!

Chris Marsh
Web Developer
Tel: +44 20 8246 4804 Ext 828
Fax: +44 20 8246 4808

Any opinions expressed in this email are those of the individual and not
necessarily the Company. This message is intended for the use of the
individual or entity to which it is addressed and may contain information
that is confidential and privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any dissemination, distribution, or copying of
this communication is strictly prohibited.
If you have received this communication in error, please contact the sender
immediately and delete it from your system.  


* * Please support the community that supports you.  * *

For unsubscribe and other options, including the Tip Harvester and archives
of thelist go to: http://lists.evolt.org Workers of the Web, evolt ! 

More information about the thelist mailing list