[thelist] Removing Microsoft Word special characters

kris burford kris at midtempo.net
Thu Sep 18 08:23:35 CDT 2003


>Is there a way to strip "smart" quotes, em dashes and ellipses from Word
>copy? (i.e. all "special" Word characters)

hi graham,

don't know what code format you're using, but this is something that i've 
used in the past to try to catch 'em all. (after strip_tags...)

it's in php, btw.

   $body = ereg_replace(38, "&", $body); // ampersand
   $body = ereg_replace(133, "…", $body); // ellipses
   $body = ereg_replace(8226, "″", $body); // double prime
   $body = ereg_replace(8216, "'", $body); // left single quote
   $body = ereg_replace(145, "'", $body); // left single quote
   $body = ereg_replace(8217, "'", $body); // right single quote
   $body = ereg_replace(146, "'", $body); // right single quote
   $body = ereg_replace(8220, """, $body); // left double quote
   $body = ereg_replace(147, """, $body); // left double quote
   $body = ereg_replace(8221, """, $body); // right double quote
   $body = ereg_replace(148, """, $body); // right double quote
   $body = ereg_replace(8226, "•", $body); // bullet
   $body = ereg_replace(149, "•", $body); // bullet
   $body = ereg_replace(8211, "–", $body); // en dash
   $body = ereg_replace(150, "–", $body); // en dash
   $body = ereg_replace(8212, "—", $body); // em dash
   $body = ereg_replace(151, "—", $body); // em dash
   $body = ereg_replace(8482, "™", $body); // trademark
   $body = ereg_replace(153, "™", $body); // trademark
   $body = ereg_replace(169, "©", $body); // copyright mark
   $body = ereg_replace(174, "®", $body); // registration mark

hth

kris




More information about the thelist mailing list