[thelist] PHP Search Engine

Richard Bennett richard.bennett at skynet.be
Thu Mar 16 15:05:59 CST 2006


On Thursday 16 March 2006 21:27, Paul Bennett wrote:
> If anyone wants the results of my research so far, let me know. I may save
> you some time

Sure, at least an overview would be nice...
Did you try mnogosearch? (not written in php, but in c, so maybe not suitable 
for the OP)
I used this one because the mysql site does.
It takes a lot of work tuning it, what with compiling php extensions and 
whatnot, but is very extensible in the end. Each document indexed can be 
passed to your own program (shell script or whatever) for 
post/pre-processing. This allows you to do things like OCRing scans, 
converting the tif to jpg, and feeding the OCR result into the search-engine, 
etc etc.
I'm running it on 839 megs of documents in a document server at the moment, it 
runs hourly spider, its database is 29 megs, and then there's a folder of 
cached documents that's 491 megs.
It is not perfect, and if you want a good result you'll put in a lot of time 
learning how to tune it, but it is scalable up to a point (you can cluster 
multiple servers for instance) and extremely flexible.

Richard.



More information about the thelist mailing list