[thelist] Search Engine Algorithms

bruce bedouglas at earthlink.net
Sat Oct 2 10:12:15 CDT 2004


if you're interested in search engine algorithms.. i'm sure that a search of
google will generate numerous responses!!!

that said, there is an open source search engine 'nutch' that should allow
you to search through the docs/code to get a feel for how it works...

good luck...

-bruce


-----Original Message-----
From: thelist-bounces at lists.evolt.org
[mailto:thelist-bounces at lists.evolt.org]On Behalf Of Ken Schaefer
Sent: Wednesday, September 29, 2004 8:22 PM
To: thelist at lists.evolt.org
Subject: Re: [thelist] Search Engine Algorithms


I'd hardly call the first a sophisticated algorithm

The second isn't an algorithm at all - the algorithm is implemented
inside SQL Server Full Text Search capabilities - you're just using an
"API" (so to speak) to get access to the results.

Any decent search algorithm needs to implement a few key
functionalities (I don't know the technical terms for these things):

a) stem matching: search for "swim", and the seach engine needs to be
be able to associate swim, swam, swum, swimming (etc) with your search
term

b) proximity: when searching for mulitple works, then having words
that are closer
together gains a higher rank.

c) being aware of synonyms or substitutions for the current search
term (as well as common mis-spellings).

Implementing these three things is no trivial matter, but is also now
the bare minimum for any decent search engine. No doubt a number of
algorithms need to be combined to deliver the functionality.

Cheers
Ken


On Wed, 29 Sep 2004 09:00:27 -0500, Rob Smith <rob.smith at thermon.com> wrote:
> Hi list,
>
> There are only two ways I know how to implement Search engine algorithms:
>
> 1) Taking the search term(s), splitting into an array if necessary,
> searching on the words to match target columns in databases, and finally
> spitting the results back out.
>
> 2) Set up indexing in SQL Server and doing CONTAINSTABLE operations
therein
> which kinda does ranking in the same swing.
>
> What other ways do you know or care to share? Our of shear curiosity, how
> does Google do it?
--

* * Please support the community that supports you.  * *
http://evolt.org/help_support_evolt/

For unsubscribe and other options, including the Tip Harvester
and archives of thelist go to: http://lists.evolt.org
Workers of the Web, evolt !



More information about the thelist mailing list