[thelist] LAN search tool
Tony Crockford
tonyc at boldfish.co.uk
Wed Dec 12 17:11:06 CST 2001
P.S.
There's a compiled binary for Win Nt that just needs the cygwin dll to run.
see:
http://www.htdig.org/files/binaries/
under htdig3.15_winnt.readme.
have fun <grin>
> -----Original Message-----
> From: thelist-admin at lists.evolt.org
> [mailto:thelist-admin at lists.evolt.org]On Behalf Of Tony Crockford
> Sent: 12 December 2001 23:09
> To: thelist at lists.evolt.org
> Subject: RE: [thelist] LAN search tool
>
>
>
> > Are you saying it has support for MS Word documents and PDFs?
> >
> > spinhead
>
>
> Yes
>
> But it's another step in the indexing process:
>
> See:
>
> http://www.htdig.org/files/contrib/parsers/
>
>
> Sample external converter script for ht://Dig 3.1.4 and above, that
> converts MS-Word, PDF or PostScript files to text (in HTML form) so
> they can be indexed. Uses the "catdoc" program to extract text from
> Word documents, "pdftotext" to extract text from PDFs, and "ps2ascii"
> to extract text from PostScript.
>
> Written by Gilles Detillieux, based on the parse_word_doc.pl script
> by Jesse op den Brouw <MSQL_User at st.hhs.nl>.
>
> External converters have two advantages over external parsers. They
> are easier to write, and the parsing is done in a more consistent way
> for all document types.
>
>
> --
> For unsubscribe and other options, including
> the Tip Harvester and archive of TheList go to:
> http://lists.evolt.org Workers of the Web, evolt !
>
More information about the thelist
mailing list