[thelist] Search Engine Algorithms
bruce
bedouglas at earthlink.net
Sat Oct 2 10:12:15 CDT 2004
if you're interested in search engine algorithms.. i'm sure that a search of
google will generate numerous responses!!!
that said, there is an open source search engine 'nutch' that should allow
you to search through the docs/code to get a feel for how it works...
good luck...
-bruce
-----Original Message-----
From: thelist-bounces at lists.evolt.org
[mailto:thelist-bounces at lists.evolt.org]On Behalf Of Ken Schaefer
Sent: Wednesday, September 29, 2004 8:22 PM
To: thelist at lists.evolt.org
Subject: Re: [thelist] Search Engine Algorithms
I'd hardly call the first a sophisticated algorithm
The second isn't an algorithm at all - the algorithm is implemented
inside SQL Server Full Text Search capabilities - you're just using an
"API" (so to speak) to get access to the results.
Any decent search algorithm needs to implement a few key
functionalities (I don't know the technical terms for these things):
a) stem matching: search for "swim", and the seach engine needs to be
be able to associate swim, swam, swum, swimming (etc) with your search
term
b) proximity: when searching for mulitple works, then having words
that are closer
together gains a higher rank.
c) being aware of synonyms or substitutions for the current search
term (as well as common mis-spellings).
Implementing these three things is no trivial matter, but is also now
the bare minimum for any decent search engine. No doubt a number of
algorithms need to be combined to deliver the functionality.
Cheers
Ken
On Wed, 29 Sep 2004 09:00:27 -0500, Rob Smith <rob.smith at thermon.com> wrote:
> Hi list,
>
> There are only two ways I know how to implement Search engine algorithms:
>
> 1) Taking the search term(s), splitting into an array if necessary,
> searching on the words to match target columns in databases, and finally
> spitting the results back out.
>
> 2) Set up indexing in SQL Server and doing CONTAINSTABLE operations
therein
> which kinda does ranking in the same swing.
>
> What other ways do you know or care to share? Our of shear curiosity, how
> does Google do it?
--
* * Please support the community that supports you. * *
http://evolt.org/help_support_evolt/
For unsubscribe and other options, including the Tip Harvester
and archives of thelist go to: http://lists.evolt.org
Workers of the Web, evolt !
More information about the thelist
mailing list