[thelist] Removing Microsoft Word special characters
Timothy J. Luoma
luomat at operamail.com
Fri Sep 19 07:34:58 CDT 2003
On Fri, 19 Sep 2003 18:05:35 +1000, Ken Schaefer <ken at adOpenStatic.com>
wrote:
> : Thanks for the tips on copy and paste but I can see now that didn't
> : explain myself properly. I need to strip the characters without using
> an
> : intermediate application. I want to clean the copy at the server-side
> : and present the cleaned copy as HTML. We are using an ASP based CMS.
> :
> : I cannot find the character codes to match on (in VBScript) in order to
> : strip Word smart quotes, m-dashes, ellipses, etc.
> Why don't you just set the character encoding to Unicode/UTF8? Won't that
> then allow you to use those characters without causing validation
> errors? Or is my knowledge of these things deficient?
I believe that Word uses "windows-1252" instead of true ISO-8859-1/Latin-1
which is where the problems come in. A brief Google did not find an ASP
solution but there is a nice chart at
http://www.problemtracker.com/pthelp5/WMS/bots_wms_add.htm
which shows the differences between the two character sets.
TjL
More information about the thelist
mailing list