[thelist] MS Word conversion to HTML

Stephanie C. Smith scsmith at medicine.tamu.edu
Fri Jan 3 10:15:01 CST 2003


If you need the web document to look *absolutely identical* to the Word
one, down to the page breaks and numbering, you're going to end up with
PDF. Sorry.

If not, you can use one of several tools to clean up the junk HTML that
Word produces. Someone already mentioned HTML Tidy; Dreamweaver  also
has a "clean up Word HTML" option under Commands. (To the person who can
write an extension to run this on more than one file at a time: Marry
me?) I'm told that Contribute does a nice job of handling text pasted
from a Word doc, but I haven't tried this yet.

Textism.com has an HTML cleaner:
http://www.textism.com/resources/cleanwordhtml/

My personal favorite is the Word Unmunger. It's a Python script, so
it's a little harder to work with than the above options. (It also has
an intermittent bug on OS X - email me off-list if you really want to
know.) It strips out *all* fonts, CSS, etc. and leaves you with a
squeaky-clean file. The author has just added a batch mode, which bumps
this thing from "incredibly useful" to "indispensable" in my work. Get
it here: http://luke.francl.org/software/word-unmunger/

Some people adore the Demoroniser, but it doesn't do much for me.
YMMV... http://www.fourmilab.ch/webtools/demoroniser/




Stephanie Smith
Web Communications Specialist
Texas A&M University System Health Science Center
http://www.tamushsc.edu/
scsmith at tamu.edu



More information about the thelist mailing list