[thelist] searching site to include pdf's
Greg Holmes
greg.holmes at gmail.com
Fri Aug 20 14:56:15 CDT 2004
Alex Beston wrote:
>Im adding a search facility to a clients site which includes PDFs.
>Is it possible to search through these documents aswell as the
> usual html's?
>is so - any open source stuff out there?
htdig uses an external parser to do this (a script that calls
a utility to extract the text and meta data, and basically feeds
the indexer a fake html page to be indexed and associated with
the PDF URL).
You might be able to use the same method with another search
engine, even if you don't use htdig.
Greg Holmes
More information about the thelist
mailing list