[thelist] strip html etc

Sam sam at sam-i-am.com
Fri Aug 22 16:52:03 CDT 2003


> I think my main advice would be to not try and write one script to do it 
> all in one go. Allow for manual intervention at each step of the way. 
> Even with 5000 files, some tasks are better handled by hand than 
> programatically.
> 
> Sam

I'd consider stripping all html a last resort. And if you do, remember 
there's some tags in there that you probably want to keep, like <a 
href>, <title> etc. .. If you strip all tags, you commit yourself to 
manually tagging the content again, file by file. Even in the best case 
you'll likely have to visit each file at some point, but this is to be 
avoided IMO.

My earlier email assumed a working knowledge of perl or some other 
scripting language for doing text processing. Even if you don't have 
that at your disposal I'll stick by my advice - don't try and get one 
tool to do it all.
Dreamweaver might be your friend for this kind of thing. It's search and 
replace tools, commands and so on are very powerful - if a little slow.
If you get a find/replace that works for you, you can record it as a 
command, then run that command on a set of files.

Sam



More information about the thelist mailing list