[thelist] Search Engine Algorithms

Ken Schaefer ken.schaefer at gmail.com
Wed Sep 29 22:22:10 CDT 2004


I'd hardly call the first a sophisticated algorithm

The second isn't an algorithm at all - the algorithm is implemented
inside SQL Server Full Text Search capabilities - you're just using an
"API" (so to speak) to get access to the results.

Any decent search algorithm needs to implement a few key
functionalities (I don't know the technical terms for these things):

a) stem matching: search for "swim", and the seach engine needs to be
be able to associate swim, swam, swum, swimming (etc) with your search
term

b) proximity: when searching for mulitple works, then having words
that are closer
together gains a higher rank.

c) being aware of synonyms or substitutions for the current search
term (as well as common mis-spellings).

Implementing these three things is no trivial matter, but is also now
the bare minimum for any decent search engine. No doubt a number of
algorithms need to be combined to deliver the functionality.

Cheers
Ken


On Wed, 29 Sep 2004 09:00:27 -0500, Rob Smith <rob.smith at thermon.com> wrote:
> Hi list,
> 
> There are only two ways I know how to implement Search engine algorithms:
> 
> 1) Taking the search term(s), splitting into an array if necessary,
> searching on the words to match target columns in databases, and finally
> spitting the results back out.
> 
> 2) Set up indexing in SQL Server and doing CONTAINSTABLE operations therein
> which kinda does ranking in the same swing.
> 
> What other ways do you know or care to share? Our of shear curiosity, how
> does Google do it?


More information about the thelist mailing list