[thelist] Scanning for many strings in many texts

manuel.gonzalez.noriega at gmail.com manuel.gonzalez.noriega at gmail.com
Thu Oct 13 10:00:31 CDT 2005


Hi there,

I feel this must be a well known problem, with best practices and
patterns available.

Say you got lots of texts and you want to scan those texts to find
ocurrences of some strings. To clarify, you have lots of sport news
stories on one hand and lots of players names on the other and you
want to classify the news stories by the players that are mentioned on
them.

What would be the best strategy in this case? I can think of some very
brute force solutions, like looping through the names and for every
one of them do a fulltext search on the stories, but this obviously is
very primitive and just wont scale.

Many thanks in advance.


--
Manuel
a veces :) a veces :(
pero siempre trabajando duro para Simplelógica: apariencia,
experiencia y comunicación en la web.
http://simplelogica.net # (+34) 985 22 12 65

¡Ah! y escribiendo en Logicola: http://logicola.simplelogica.net


More information about the thelist mailing list