[thelist] Search Function on Website

Ken Schaefer Ken at adOpenStatic.com
Tue May 16 08:18:03 CDT 2006


Surely if you want decent search functionality these days you buy a 3rd party
product that supports stemming (and other lexicography terms I don't even
know), so that searches for "swim" would also returns pages that have "swum",
but would possibly return "swan" but at a much lower rating. Simply matching,
or catering for "mis-spellings" via simple algorithms is surely so "1990s"
search technology?

Cheers
Ken

--
My IIS Blog: www.adOpenStatic.com/cs/blogs/ken
Tech.Ed Boston 2006 See you there: Everything the web administrator needs to
know about MOM 2005

:  -----Original Message-----
:  From: thelist-bounces at lists.evolt.org [mailto:thelist-
:  bounces at lists.evolt.org] On Behalf Of kasimir-k
:  Sent: Tuesday, 16 May 2006 8:37 PM
:  To: justin at jazzmanagement.com.au; thelist at lists.evolt.org
:  Subject: Re: [thelist] Search Function on Website
:  
:  Justin Zachan scribeva in 16/05/2006 4:56:
:  > Does anyone have suggestions on how best to deal with allowing the
:  search
:  > function to allow for spelling mistakes???
:  
:  There are various ways to determine the similarity of two given
:  strings
:  (which is what you want to do to allow for spelling mistakes).
:  
:  Simon White has a nice introduction to the subject here:
:  http://www.catalysoft.com/articles/MatchingSimilarStrings.html
:  And he presents his own approach here:
:  http://www.catalysoft.com/articles/StrikeAMatch.html
:  
:  Below is a PHP version of the algorithm:
:  
:  /**
:  *	Sting Similarity based on common character pairs
:  *	@param		$str0
:  *	@param		$str1
:  *	@return		similarity value, from 0 to 1
:  */
:  function strSim($str0, $str1) {
:      $pairs = array();
:      for ($p = 0; $p < 2; $p++) {
:         $pairs[$p] = array();
:         $str = ' ' . trim(preg_replace('/\s+/', ' ', ${"str$p"})) . '
:  ';
:         for ($i = 0, $ii = strlen($str) - 1; $i < $ii; $i++) {
:            $pairs[$p][] = strtoupper(substr($str, $i, 2));
:         }
:      }
:      $intersection = 0;
:      $union = count($pairs[0]) + count($pairs[1]);
:      for ($i = 0, $ii = count($pairs[0]); $i < $ii; $i++) {
:         if (($key = array_search($pairs[0][$i], $pairs[1])) !== false)
:  {
:            $intersection++;
:            unset($pairs[1][$key]);
:         }
:      }
:      return 2 * $intersection / $union;
:  }
:  
:  
:  hth,
:  .k




More information about the thelist mailing list