[thelist] FW: XML PHP Special Characters
Ivo P
ipletikosic at gmail.com
Tue Mar 29 10:45:44 CST 2005
I was having these same problems with documents that had copyright
symbols, non-latin character sets & needed to convert all these to
their equivalent numeric entity. I get these in massive amounts from
various sources so I needed to automate this.
I solved the issue by running the XML files thru tidy
(http://www.w3.org/People/Raggett/tidy/) with a configuration
specifying xml input/output and to use numeric-entities. It takes
seconds for huge files and converts all special characters into their
numeric entity equivalent on the fly.
On Mon, 28 Mar 2005 14:40:20 -0800, Mark Joslyn
<Mark.Joslyn at solimarsystems.com> wrote:
> List,
>
> I am parsing an XML document with PHP:
>
> // Create an XML parser
> $xml_parser = xml_parser_create();
>
> // Set the functions to handle opening and closing tags
> xml_set_element_handler($xml_parser, "startElement", "endElement");
>
> etc..
>
> I have a special character inside the XML document (trademark symbol ™) that
> is being parsed and what is returned is a question mark.
>
> The XML document is encoded as:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
>
> The special character I am using is coded as:
>
> ™ but I have used ™
>
> The php page is encoded:
>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
>
> Is there a way I can have this special character go through the parsing
> process but still return a valid trademark symbol?
>
> Any help would be appreciated.
>
> Thanks,
>
> markJ
>
> --
>
> * * Please support the community that supports you. * *
> http://evolt.org/help_support_evolt/
>
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>
More information about the thelist
mailing list