[thelist] Removing Microsoft Word special characters

Timothy J. Luoma luomat at operamail.com
Fri Sep 19 07:34:58 CDT 2003


On Fri, 19 Sep 2003 18:05:35 +1000, Ken Schaefer <ken at adOpenStatic.com> 
wrote:

> : Thanks for the tips on copy and paste but I can see now that didn't
> : explain myself properly. I need to strip the characters without using 
> an
> : intermediate application. I want to clean the copy at the server-side 
> : and present the cleaned copy as HTML. We are using an ASP based CMS.
> :
> : I cannot find the character codes to match on (in VBScript) in order to
> : strip Word smart quotes, m-dashes, ellipses, etc.

> Why don't you just set the character encoding to Unicode/UTF8? Won't that
> then allow you to use those characters without causing validation 
> errors? Or is my knowledge of these things deficient?


I believe that Word uses "windows-1252" instead of true ISO-8859-1/Latin-1 
which is where the problems come in.  A brief Google did not find an ASP 
solution but there is a nice chart at

http://www.problemtracker.com/pthelp5/WMS/bots_wms_add.htm

which shows the differences between the two character sets.

TjL


More information about the thelist mailing list