[thelist] XML/XSLT and accented charaters...

darren darren at web-bitch.co.uk
Wed Oct 3 08:19:47 CDT 2001


On 03 October 2001 at 13:40:23, aardvark <roselli at earthlink.net> wrote:

a> encode your encodings...

a> &uacute; --> &amp;uacute;

<grin>  i'd already tried this...and it doesn't seem to work, you get
&amp;uacute; in the text!

a> and if you use the numerical entity (which tends to offer more
a> support across older browsers, since the named ones generally
a> came later), it would be the same...

don't these change depending on the character set used??  for example
i think &#229 = &aring; on windows and &Acirc; on a mac.


the way to do it is...as i found and sam marshall confirmed for me about
10 mins after...(they're his words so he gets the tip accreditation)

<tip type="xml, extended characters" author="S.Marshall at open.ac.uk">
The best way is to move to a completely UTF-8 system:

1. Make sure that your XML files are specified as encoding UTF-8 (this is
the default, but specify it anyway) using the following header:

<?xml version="1.0" encoding="UTF-8"?>

2. Write your XML files with an editor that supports UTF-8 format, such as
Notepad, Textpad, or Word 2000 ('text only UTF-8'), on Windows 2000. On
non-Windows platforms I don't know. 

3. Enter special characters normally without escaping them and make sure
your document is saved as UTF-8.

4. Make sure that your templates for HTML output contain the header for
UTF-8 as well:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

5. If possible, make your server also send the above MIME type with charset
when it serves the HTML files (if not possible it doesn't matter too much)
</tip>

<tip type="xml, unicode codes" author="darren">
you can enter the unicode value for the extended characters in a similar
way to the old numerical entities.  in xml you escape them with:

   &#value     for decimal references
   &#xvalue    for hexadecimal references

to find your unicode references try

   http://www.unicode.org/charts/

for a list of all the charts.

alternatively character map on windows 2000 will display the unicode
values as well as the decimal ones.
</tip>





More information about the thelist mailing list