On Thursday 16 March 2006 21:27, Paul Bennett wrote: > If anyone wants the results of my research so far, let me know. I may save > you some time Sure, at least an overview would be nice... Did you try mnogosearch? (not written in php, but in c, so maybe not suitable for the OP) I used this one because the mysql site does. It takes a lot of work tuning it, what with compiling php extensions and whatnot, but is very extensible in the end. Each document indexed can be passed to your own program (shell script or whatever) for post/pre-processing. This allows you to do things like OCRing scans, converting the tif to jpg, and feeding the OCR result into the search-engine, etc etc. I'm running it on 839 megs of documents in a document server at the moment, it runs hourly spider, its database is 29 megs, and then there's a folder of cached documents that's 491 megs. It is not perfect, and if you want a good result you'll put in a lot of time learning how to tune it, but it is scalable up to a point (you can cluster multiple servers for instance) and extremely flexible. Richard.