[thelist] Removing Microsoft Word special characters
kris burford
kris at midtempo.net
Thu Sep 18 08:23:35 CDT 2003
>Is there a way to strip "smart" quotes, em dashes and ellipses from Word
>copy? (i.e. all "special" Word characters)
hi graham,
don't know what code format you're using, but this is something that i've
used in the past to try to catch 'em all. (after strip_tags...)
it's in php, btw.
$body = ereg_replace(38, "&", $body); // ampersand
$body = ereg_replace(133, "…", $body); // ellipses
$body = ereg_replace(8226, "″", $body); // double prime
$body = ereg_replace(8216, "'", $body); // left single quote
$body = ereg_replace(145, "'", $body); // left single quote
$body = ereg_replace(8217, "'", $body); // right single quote
$body = ereg_replace(146, "'", $body); // right single quote
$body = ereg_replace(8220, """, $body); // left double quote
$body = ereg_replace(147, """, $body); // left double quote
$body = ereg_replace(8221, """, $body); // right double quote
$body = ereg_replace(148, """, $body); // right double quote
$body = ereg_replace(8226, "•", $body); // bullet
$body = ereg_replace(149, "•", $body); // bullet
$body = ereg_replace(8211, "–", $body); // en dash
$body = ereg_replace(150, "–", $body); // en dash
$body = ereg_replace(8212, "—", $body); // em dash
$body = ereg_replace(151, "—", $body); // em dash
$body = ereg_replace(8482, "™", $body); // trademark
$body = ereg_replace(153, "™", $body); // trademark
$body = ereg_replace(169, "©", $body); // copyright mark
$body = ereg_replace(174, "®", $body); // registration mark
hth
kris
More information about the thelist
mailing list