[thelist] Scanning for many strings in many texts

manuel.gonzalez.noriega at gmail.com manuel.gonzalez.noriega at gmail.com
Thu Oct 13 18:32:42 CDT 2005


On 13/10/05, Matthias Willerich <matthias at die-legendaeren.de> wrote:
> Manuel,
> that problem really interests me, sadly I'm not an expert.
>
> I'm thinking about it from 2 sides:

Hi Matthias,

You caught me in the middle of a hacking session! :)

Funny that you mention tagging, what I'm exploring is a concept that I
refer to as 'implicit tagging' (surely theres'a better name
somewhere), that is, the fact that  certaing words appearing in the
content can be treated as actual tags of the content for certain
purposes.

Right now I'm just implementing what we discussed previously: doing a
fulltext search of every term in the dataset and tagging the results
accordingly.  With 200+ 'predefined tags' and 100+ documents to
explore it's working very fast (mysql, Ibook)

On the subject of searching, everything I can explain in my broken
english is much better explained on the excellent 'On Search' series
by Tim Bray :)

http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC

I'll update this thread if I make any significant progress in my endeavours :)

--
Manuel
a veces :) a veces :(
pero siempre trabajando duro para Simplelógica: apariencia,
experiencia y comunicación en la web.
http://simplelogica.net # (+34) 985 22 12 65

¡Ah! y escribiendo en Logicola: http://logicola.simplelogica.net


More information about the thelist mailing list