[thelist] strip html etc

Wesley Aaron Mason (1st Vamp) wes at pmason.karoo.co.uk
Sat Aug 23 06:59:44 CDT 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: MD5

Well, as someone else suggestion, if you have PHP (or Perl), you should be able to easilly
strip out all presentation-orientated markup, leave some semantic markup and perhaps
convert some presentation markup into semantic markup (but you'd have to look through a
sample (ergo. distributed sample) of the documents before hand to see if any presentation markup is in
fact used to convey semantic meaning and can be thus converted), and reading the document
tree, looping through documents, parsing and then outputting to a new location for
double-checking afterwards is also a piece of the proverbial.

Just an idea, ignore if you're not comfortable with the tools, or dislike the method.

- --
1stVamp
(Wesley Aaron Mason)

[ Site at somewhere :: http://1stvamp.org/ ]
[ Webcomic from nowhere :: http://gfbowl.com/ ]

Saturday, August 23, 2003, 3:36:43 AM, Diane Soini wrote:

> On Friday, August 22, 2003, at 12:29 PM,
> thelist-request at lists.evolt.org wrote:

>> On 8/22/03 8:50 AM, "Tobyn Baugher" <toby at rsux.com> wrote:
>>
>>> On Thu, Aug 21, 2003 at 09:18:00PM -0500, george donnelly wrote:
>>>> My current task is to separate the content from the presentation in
>>>> about
>>>> 5000+ html pages so that it can be dumped into the new site.

> BBEdit for Mac:
Markup >> Utilities > Remove Markup.

> I suppose you would have to write an Applescript to make it do this as
> a batch process, though.

-----BEGIN PGP SIGNATURE-----
Version: 2.6

iQIMAwUAP0dXM+tR7En81eJTAQEV5w+3akdzkqPhW57zZnvaSMXRDCKXJTb4dghM
ZYWtjOOxY8dbaGi0JzFXFfvB1IOCuPnNF33PbJkzA1T7AEZaqdCoGKTa9U4pnCHq
c4Mvj3YRXxsJCekhDivIbJ1udX0JQx5hcK2rkMXTicXhMynjgThH+ADp6HHacVzz
57g8LC1smeJvAb7TYesNBFrCujqp5liwIuX4IAUaAcygxW4v86EStaFL+Z9c7Mcd
zzzu8x57ioYREkCpzWK5dzIqEEwvaQNKzzyYhP1XCEzLjrdYMdHJKDA5ZuX8Eb+n
iPJbeI/fiKGRQrye02VzndN0pBtyk7qhZ/Xt4nRUiPCYnte4oQ6fIPSw6hTZ+LR9
De7TDOMG6ofJSzAfte6YHbzJYB621S3SpSYGT28bqJLgMRp0qf3aKIxafUMiqJDA
5P11qAgDhGpO39Cb0yqyc+dcunn1oaNj7jpHngxqb9C5l+hiHOPD9YqluICEVNz3
WWaHwK7UHvhtOYpTwWAWp3Au0ST1cWNNmQK3eU7s2IO5QJwW2W9b0PwQyK0ytSg4
2aenaw/PBlGYGkejjhWJwvk3jC/+xJ/3hidmoc/oV2FCjkhKTA2BIlmvsWT7qkpg
bgNcJ6GplBprVM1d3TsUzz7EL6KmA64HQGcthP0hO74nbvP6FwQPRfrgDSWgHcU=
=Ixot
-----END PGP SIGNATURE-----



More information about the thelist mailing list