[thelist] Search Function on Website
Ken Schaefer
Ken at adOpenStatic.com
Tue May 16 08:18:03 CDT 2006
Surely if you want decent search functionality these days you buy a 3rd party
product that supports stemming (and other lexicography terms I don't even
know), so that searches for "swim" would also returns pages that have "swum",
but would possibly return "swan" but at a much lower rating. Simply matching,
or catering for "mis-spellings" via simple algorithms is surely so "1990s"
search technology?
Cheers
Ken
--
My IIS Blog: www.adOpenStatic.com/cs/blogs/ken
Tech.Ed Boston 2006 See you there: Everything the web administrator needs to
know about MOM 2005
: -----Original Message-----
: From: thelist-bounces at lists.evolt.org [mailto:thelist-
: bounces at lists.evolt.org] On Behalf Of kasimir-k
: Sent: Tuesday, 16 May 2006 8:37 PM
: To: justin at jazzmanagement.com.au; thelist at lists.evolt.org
: Subject: Re: [thelist] Search Function on Website
:
: Justin Zachan scribeva in 16/05/2006 4:56:
: > Does anyone have suggestions on how best to deal with allowing the
: search
: > function to allow for spelling mistakes???
:
: There are various ways to determine the similarity of two given
: strings
: (which is what you want to do to allow for spelling mistakes).
:
: Simon White has a nice introduction to the subject here:
: http://www.catalysoft.com/articles/MatchingSimilarStrings.html
: And he presents his own approach here:
: http://www.catalysoft.com/articles/StrikeAMatch.html
:
: Below is a PHP version of the algorithm:
:
: /**
: * Sting Similarity based on common character pairs
: * @param $str0
: * @param $str1
: * @return similarity value, from 0 to 1
: */
: function strSim($str0, $str1) {
: $pairs = array();
: for ($p = 0; $p < 2; $p++) {
: $pairs[$p] = array();
: $str = ' ' . trim(preg_replace('/\s+/', ' ', ${"str$p"})) . '
: ';
: for ($i = 0, $ii = strlen($str) - 1; $i < $ii; $i++) {
: $pairs[$p][] = strtoupper(substr($str, $i, 2));
: }
: }
: $intersection = 0;
: $union = count($pairs[0]) + count($pairs[1]);
: for ($i = 0, $ii = count($pairs[0]); $i < $ii; $i++) {
: if (($key = array_search($pairs[0][$i], $pairs[1])) !== false)
: {
: $intersection++;
: unset($pairs[1][$key]);
: }
: }
: return 2 * $intersection / $union;
: }
:
:
: hth,
: .k
More information about the thelist
mailing list