[thelist] PHP Search Engine
Richard Bennett
richard.bennett at skynet.be
Thu Mar 16 15:05:59 CST 2006
On Thursday 16 March 2006 21:27, Paul Bennett wrote:
> If anyone wants the results of my research so far, let me know. I may save
> you some time
Sure, at least an overview would be nice...
Did you try mnogosearch? (not written in php, but in c, so maybe not suitable
for the OP)
I used this one because the mysql site does.
It takes a lot of work tuning it, what with compiling php extensions and
whatnot, but is very extensible in the end. Each document indexed can be
passed to your own program (shell script or whatever) for
post/pre-processing. This allows you to do things like OCRing scans,
converting the tif to jpg, and feeding the OCR result into the search-engine,
etc etc.
I'm running it on 839 megs of documents in a document server at the moment, it
runs hourly spider, its database is 29 megs, and then there's a folder of
cached documents that's 491 megs.
It is not perfect, and if you want a good result you'll put in a lot of time
learning how to tune it, but it is scalable up to a point (you can cluster
multiple servers for instance) and extremely flexible.
Richard.
More information about the thelist
mailing list