[thelist] LAN search tool

Tony Crockford tonyc at boldfish.co.uk
Wed Dec 12 17:11:06 CST 2001


P.S.

There's a compiled binary for Win Nt that just needs the cygwin dll to run.

see:

http://www.htdig.org/files/binaries/

under htdig3.15_winnt.readme.


have fun <grin>


> -----Original Message-----
> From: thelist-admin at lists.evolt.org
> [mailto:thelist-admin at lists.evolt.org]On Behalf Of Tony Crockford
> Sent: 12 December 2001 23:09
> To: thelist at lists.evolt.org
> Subject: RE: [thelist] LAN search tool
> 
> 
> 
> > Are you saying it has support for MS Word documents and PDFs?
> > 
> > spinhead
> 
> 
> Yes 
> 
> But it's another step in the indexing process:
> 
> See:
> 
> http://www.htdig.org/files/contrib/parsers/
> 
> 
> Sample external converter script for ht://Dig 3.1.4 and above, that
> converts MS-Word, PDF or PostScript files to text (in HTML form) so
> they can be indexed.  Uses the "catdoc" program to extract text from
> Word documents, "pdftotext" to extract text from PDFs, and "ps2ascii"
> to extract text from PostScript.
> 
> Written by Gilles Detillieux, based on the parse_word_doc.pl script
> by Jesse op den Brouw <MSQL_User at st.hhs.nl>.
> 
> External converters have two advantages over external parsers.  They
> are easier to write, and the parsing is done in a more consistent way
> for all document types.
> 
> 
> -- 
> For unsubscribe and other options, including
> the Tip Harvester and archive of TheList go to:
> http://lists.evolt.org Workers of the Web, evolt ! 
> 




More information about the thelist mailing list