[thelist] LAN search tool

spinhead evolt at spinhead.com
Wed Dec 12 17:10:59 CST 2001


Well, that's a step closer. Perhaps it's worth a try. Thanks for the extra
effort.

spinhead


----- Original Message -----
From: "Tony Crockford" <tonyc at boldfish.co.uk>
To: <thelist at lists.evolt.org>
Sent: Wednesday, December 12, 2001 3:08 PM
Subject: RE: [thelist] LAN search tool


>
> > Are you saying it has support for MS Word documents and PDFs?
> >
> > spinhead
>
>
> Yes
>
> But it's another step in the indexing process:
>
> See:
>
> http://www.htdig.org/files/contrib/parsers/
>
>
> Sample external converter script for ht://Dig 3.1.4 and above, that
> converts MS-Word, PDF or PostScript files to text (in HTML form) so
> they can be indexed.  Uses the "catdoc" program to extract text from
> Word documents, "pdftotext" to extract text from PDFs, and "ps2ascii"
> to extract text from PostScript.
>
> Written by Gilles Detillieux, based on the parse_word_doc.pl script
> by Jesse op den Brouw <MSQL_User at st.hhs.nl>.
>
> External converters have two advantages over external parsers.  They
> are easier to write, and the parsing is done in a more consistent way
> for all document types.
>
> >





More information about the thelist mailing list