[thelist] Scanning for many strings in many texts

manuel.gonzalez.noriega at gmail.com manuel.gonzalez.noriega at gmail.com
Thu Oct 13 14:04:16 CDT 2005


On 13/10/05, Hassan Schroeder <hassan at webtuitive.com> wrote:

> I'm wondering now if I understand your goal; why are you searching,
> apparently in the absence of a user query, for '3 million different
> strings'?

Ok, when I said 3 million searches x 100 documents, my brain was
obviously shut off. I was somehow considering one fulltext search per
row, instead of per table. Let's just forget that.

Say, for example, you have to implement alerts for certain terms. The
data space is a bunch of news stories. The alerts can be single
strings ('Hassan')  or multi string 'Hassan Schroeder'.

So, every night you scan the N news stories of the day, seeking
documents that match any of the M predefined 'alerts', That's
basically the problem.

Tokenizing the documents and searching every one agaist the alerts
table would reduce the number of searchs but will fail for multi term
alerts.

Now I'm inclined to think that there's no way around it but doing M
fulltext searches :-)



--
Manuel
a veces :) a veces :(
pero siempre trabajando duro para Simplelógica: apariencia,
experiencia y comunicación en la web.
http://simplelogica.net # (+34) 985 22 12 65

¡Ah! y escribiendo en Logicola: http://logicola.simplelogica.net


More information about the thelist mailing list