[thelist] Search Engine Algorithms
Ken Schaefer
ken.schaefer at gmail.com
Wed Sep 29 22:22:10 CDT 2004
I'd hardly call the first a sophisticated algorithm
The second isn't an algorithm at all - the algorithm is implemented
inside SQL Server Full Text Search capabilities - you're just using an
"API" (so to speak) to get access to the results.
Any decent search algorithm needs to implement a few key
functionalities (I don't know the technical terms for these things):
a) stem matching: search for "swim", and the seach engine needs to be
be able to associate swim, swam, swum, swimming (etc) with your search
term
b) proximity: when searching for mulitple works, then having words
that are closer
together gains a higher rank.
c) being aware of synonyms or substitutions for the current search
term (as well as common mis-spellings).
Implementing these three things is no trivial matter, but is also now
the bare minimum for any decent search engine. No doubt a number of
algorithms need to be combined to deliver the functionality.
Cheers
Ken
On Wed, 29 Sep 2004 09:00:27 -0500, Rob Smith <rob.smith at thermon.com> wrote:
> Hi list,
>
> There are only two ways I know how to implement Search engine algorithms:
>
> 1) Taking the search term(s), splitting into an array if necessary,
> searching on the words to match target columns in databases, and finally
> spitting the results back out.
>
> 2) Set up indexing in SQL Server and doing CONTAINSTABLE operations therein
> which kinda does ranking in the same swing.
>
> What other ways do you know or care to share? Our of shear curiosity, how
> does Google do it?
More information about the thelist
mailing list