[thelist] searching site to include pdf's

Greg Holmes greg.holmes at gmail.com
Fri Aug 20 14:56:15 CDT 2004


Alex Beston wrote:
>Im adding a search facility to a clients site which includes PDFs.
>Is it possible to search through these documents aswell as the
> usual html's?
>is so - any open source stuff out there?

htdig uses an external parser to do this (a script that calls
a utility to extract the text and meta data, and basically feeds
the indexer a fake html page to be indexed and associated with
the PDF URL).

You might be able to use the same method with another search
engine, even if you don't use htdig.

Greg Holmes


More information about the thelist mailing list