[thelist] Weird Data from Text File: 
liorean
liorean at gmail.com
Tue Mar 11 04:21:51 CDT 2008
> Casey Crookston wrote:
> > Advertiser
> >
> > I have NO idea where the  is coming from! It most certainly is not
> > in the text file!!!
> >
> > Any ideas?
It's a typical encoding problem, as follows:
- UTF-8 is an 8-bit unit variable width encoding that is a superset of US-ASCII.
- UTF-16 has a magical cookie (the Byte Order Mark, BOM) that tells
implementations which byte order the document is encoded using.
- Windows traditionally uses an ANSI encoding (which one depends on
locale) that is 8-bit unit variable width and is a superset of
US-ASCII.
In order to differentiate UTF-8 from ANSI, Microsoft inserts the
UTF-16 BOM, encoded as three separate UTF-8 code units, to tell that
the encoding is UTF-8.
If that magical cookie is interpreted as ANSI, it will be interpreted
as the three character sequence "".
So, it's there because you're treating a UTF-8 encoded file as ANSI.
--
David "liorean" Andersson
More information about the thelist
mailing list