[thelist] Removing Microsoft Word special characters
rudy
rudy937 at rogers.com
Thu Sep 18 10:25:41 CDT 2003
> $body = ereg_replace(149, "•", $body); // bullet
> ... and others
kris, you may want to reconsider mapping away from
perfectly good code values to microsoftisms...
"MS Windows introduced a group of codings in
which these code positions [128-159] were used
for printable characters, some of which are much
in demand with certain authors: the trademark
glyph, matched quotes and so forth. These are
the encodings such as "code page" 1252. It would
appear to be protocol-correct to offer documents
in these encodings, with 8-bit characters in that
range, as long as they are sent with an appropriate
charset value and the recipient accepts this charset
encoding. THAT IS NOT AT ALL THE SAME THING AS
ATTEMPTING TO REPRESENT THOSE CHARACTERS BY NUMERIC
CHARACTER REFERENCES SUCH AS ™ AS ONE SO OFTEN
SEES. The meaning of the latter construct is undefined
(N.B: not "illegal", but "undefined") in standard
HTML: the protocol-correct representation of a trademark
as a numeric character reference is in fact ™
as can be seen in the W3C reference already cited; and
correspondingly for the matched quotes and such."
-- http://ppewww.ph.gla.ac.uk/~flavell/charset/internat.html
see adrian's seminal article http://evolt.org/entities
rudy
More information about the thelist
mailing list