[thelist] LAN search tool
spinhead
evolt at spinhead.com
Wed Dec 12 17:10:59 CST 2001
Well, that's a step closer. Perhaps it's worth a try. Thanks for the extra
effort.
spinhead
----- Original Message -----
From: "Tony Crockford" <tonyc at boldfish.co.uk>
To: <thelist at lists.evolt.org>
Sent: Wednesday, December 12, 2001 3:08 PM
Subject: RE: [thelist] LAN search tool
>
> > Are you saying it has support for MS Word documents and PDFs?
> >
> > spinhead
>
>
> Yes
>
> But it's another step in the indexing process:
>
> See:
>
> http://www.htdig.org/files/contrib/parsers/
>
>
> Sample external converter script for ht://Dig 3.1.4 and above, that
> converts MS-Word, PDF or PostScript files to text (in HTML form) so
> they can be indexed. Uses the "catdoc" program to extract text from
> Word documents, "pdftotext" to extract text from PDFs, and "ps2ascii"
> to extract text from PostScript.
>
> Written by Gilles Detillieux, based on the parse_word_doc.pl script
> by Jesse op den Brouw <MSQL_User at st.hhs.nl>.
>
> External converters have two advantages over external parsers. They
> are easier to write, and the parsing is done in a more consistent way
> for all document types.
>
> >
More information about the thelist
mailing list