[thelist] Automatically declare all XHTML character refs in an XML file?

Means, Eric D eric.d.means at boeing.com
Thu Oct 17 16:15:01 CDT 2002


I read the evolt article on embedding HTML in XML[1], and I'm not super
excited about doing things that way.  It's a clever solution, but I'd rather
have all of the content there, not some of it hidden away in comments.

All of the HTML I'll need to be embedding is valid XHTML, which makes it
(almost) valid XML.  So theoretically, I should just be able to stick it in
the XML tree like so:

<post>
  <date>...</date>
  <body><valid XHTML here></body>
  ...
</post>

However.  Certain things are "automatically" valid in XHTML that aren't in
XML, namely entity references (aside from the four or five default ones).
So if there's an &mdash; or an &eacute; in the body, it will validate as
XHTML but not as XML.

I *could* manually define all the possible entities, but that seems to be
both tedious and reinventing the wheel.

How would I go about automatically inserting all the relevant entities into
my XML DTD?  I tried the following:
<!DOCTYPE randomdoctype [
<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
]>

(Those are stolen right out of the XHTML 1.0 Strict DTD, though I've altered
the URLs from relative to absolute.)
Those don't work; &mdash; is still invalid.  What am I doing wrong?

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Eric D Means -- eric.d.means at boeing.com
Embedded Software Engineer
Boeing Sustainment Data Systems
314-233-0484
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-




More information about the thelist mailing list