[thelist] SEARCHING: algorithms, techniques, etc. to get better search results for users

Chris W. Parker cparker at swatgear.com
Mon Jul 8 18:05:00 CDT 2002


hi.

i work on an e-comm. site that allows the user to search our product
database. currently you can search using "all search terms" "any of the
search terms" or "exact match". the fields that get searched are the
product name, and short/long description.

this a pretty basic search that can sometimes result in completely
unintended results.

let's say i have the name of a product is "MK-46" and another product
has the words "this is an accessory for the mk-46 widgets..." in it's
description. if someone searches for "46" they will get both products.

of course, i don't know what the user was intending to find. it could
have been that they were intending to find accessories for the mk-46, or
it could be that they were looking for the mk-46 itself.

when i went to amazon.com today, i put in the word "storytelling" and
although there were numerous matches, the one /i/ was looking for was in
the first category, dvds.

i'm trying to come up with ways to enhance the searching of the product
database. i think the default way is too simple. on another site i added
a keyword field which contained misspellings and common misnomer's for
different products.

this helps the user when it comes to misspellings and misnomer's, but it
doesn't help when it comes to the importance of a product.

therefore i added an importance field and ranked the different products
based on their "importance". with the products i described above, the
mk-46 could be the most important product in it's category, and at the
same time the mk-46 accessory product could be the most important
accessory. so what, do both products get the same importance? well no,
they can't. but then how can i determine intent? (i understand that i
can't exactly.)

what i would like to be able to do is count the occurences of a certain
search term and sort the products by the number of times the search term
is found in the different fields. this makes sense that it would bring
the more-like products to the top of the list and the least-like
products to the bottom. but at the same time it's possible that an
accessory could refer to a certain product more than a product would
refer to itself.

or how about importance placed on words? for example... if someone
searched for "46", the occurence of '46' in the mk-46 product would be
more important than the occurence of '46' in the accessory to the mk-46.

unfortunately all of these things require human attention on my side and
i don't have time to keep up with all the changes that might occur, let
alone the initial population of the database.

any ideas, suggestions, etc.?


thanks!
chris.



More information about the thelist mailing list