[thelist] Scanning for many strings in many texts

Hassan Schroeder hassan at webtuitive.com
Thu Oct 13 14:54:55 CDT 2005


manuel.gonzalez.noriega at gmail.com wrote:

> So, every night you scan the N news stories of the day, seeking
> documents that match any of the M predefined 'alerts', That's
> basically the problem.

I don't know if there's anything published about the optimizations,
but Verity's Topic was originally created to address exactly that
kind of issue for the CIA/NSA -- so we're talking about a *fairly*
high volume of incoming data :-)

> Now I'm inclined to think that there's no way around it but doing M
> fulltext searches :-)

The above aside, I'm inclined to believe you're right.

Practically speaking, it may just be a matter of throwing hardware
at it.

Good luck!
-- 
Hassan Schroeder ----------------------------- hassan at webtuitive.com
Webtuitive Design ===  (+1) 408-938-0567   === http://webtuitive.com

                          dream.  code.




More information about the thelist mailing list