[thelist] FW: XML PHP Special Characters

Ivo P ipletikosic at gmail.com
Tue Mar 29 10:45:44 CST 2005


I was having these same problems with documents that had copyright
symbols, non-latin character sets & needed to convert all these to
their equivalent numeric entity. I get these in massive amounts from
various sources so I needed to automate this.

I solved the issue by running the XML files thru tidy
(http://www.w3.org/People/Raggett/tidy/) with a configuration
specifying xml input/output and to use numeric-entities. It takes
seconds for huge files and converts all special characters into their
numeric entity equivalent on the fly.


On Mon, 28 Mar 2005 14:40:20 -0800, Mark Joslyn
<Mark.Joslyn at solimarsystems.com> wrote:
> List,
> 
> I am parsing an XML document with PHP:
> 
> // Create an XML parser
> $xml_parser = xml_parser_create();
> 
> // Set the functions to handle opening and closing tags
> xml_set_element_handler($xml_parser, "startElement", "endElement");
> 
> etc..
> 
> I have a special character inside the XML document (trademark symbol ™) that
> is being parsed and what is returned is a question mark.
> 
> The XML document is encoded as:
> 
> <?xml version="1.0" encoding="iso-8859-1"?>
> 
> The special character I am using is coded as:
> 
> &#x2122; but I have used &#8482;
> 
> The php page is encoded:
> 
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> 
> Is there a way I can have this special character go through the parsing
> process but still return a valid trademark symbol?
> 
> Any help would be appreciated.
> 
> Thanks,
> 
> markJ
> 
> --
> 
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
> 
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>


More information about the thelist mailing list