[thelist] Removing Microsoft Word special characters

Ken Schaefer ken at adOpenStatic.com
Fri Sep 19 07:47:18 CDT 2003

----- Original Message ----- 
From: "Timothy J. Luoma" <luomat at operamail.com>
To: <thelist at lists.evolt.org>
Sent: Friday, September 19, 2003 10:34 PM
Subject: Re: [thelist] Removing Microsoft Word special characters

: On Fri, 19 Sep 2003 18:05:35 +1000, Ken Schaefer <ken at adOpenStatic.com>
: wrote:
: > : Thanks for the tips on copy and paste but I can see now that didn't
: > : explain myself properly. I need to strip the characters without using
: > an
: > : intermediate application. I want to clean the copy at the server-side
: > : and present the cleaned copy as HTML. We are using an ASP based CMS.
: > :
: > : I cannot find the character codes to match on (in VBScript) in order
: > : strip Word smart quotes, m-dashes, ellipses, etc.
: > Why don't you just set the character encoding to Unicode/UTF8? Won't
: > then allow you to use those characters without causing validation
: > errors? Or is my knowledge of these things deficient?
: I believe that Word uses "windows-1252" instead of true ISO-8859-1/Latin-1
: which is where the problems come in.  A brief Google did not find an ASP
: solution but there is a nice chart at
: http://www.problemtracker.com/pthelp5/WMS/bots_wms_add.htm
: which shows the differences between the two character sets.

I only mention charset=UTF-8 because I've stuck that onto various pages that
have been converted from MS Word just so that the w3.org validator doesn't
complain (so I can validate the rest of the HTML). It seems to deal fine
with all the characters mentioned (smart quotes, m-dashes etc)


More information about the thelist mailing list