[thelist] SEARCHING: algorithms, techniques, etc. to get better search results for users

Steve Lewis slewis at macrovista.net
Mon Jul 8 18:59:01 CDT 2002


Chris W. Parker wrote:

> let's say i have the name of a product is "MK-46" and another product
> has the words "this is an accessory for the mk-46 widgets..." in it's
> description. if someone searches for "46" they will get both products.
you cannot divine the user's purpose.  if they search for MK-46 you need
to give all matches.  The only thing that you should probably worry
about influencing is the order of appearance.  If the person didn't want
accessories, and that is all they see on the first page, the search
failed.  If they wanted accessories and they dont see any on the first
page, the search failed.  Identifying a better search order than these
two is left as an exercise for the reader.

> any ideas, suggestions, etc.?

the search is spanning many DB fields: product name, description,
keywords, etc... so weight some fields more than others as a matter of
principle.  (# of occurances in title *4) + (# of occurances in
description *1.2) + (# of occurances in keywords *0.6) or something like
that.

How about adding a record-specific field for product weight (added to
the weight of searches for instance) so marketing can have an easier
time pushing a hot new gizmo model over an older model. (simply assign a
higher product weight in the DB and voila it appears higher in the results!)

You might also lower the weight (counts as 0.5 occurances, for instance)
when the search term appears within another word (searching for 46
matches 0.5 occurances in "mk-46").  This is not for the feint of heart.

What weight algorithm should you use?  That depends on the writing style
of the individual(s) who enter the products into the DB. The users who
will be searching, and what display order the marketing folks want to see.

Do the authors tend to use alot of repetiton between keywords and title
but underweight words that appear in the description?  Maybe you don't
need to inflate the multiplier on the title field as much.  Experiment
and see what you can find.  Start with a very simple mathmatical model,
and add complexity as needed.

You can probably make a decent effort at weighting content correctly by
thinking as a customer and doing some searches.  Look at a product and
punch in the first identifying symbols you see on the product (model
names, product class, etc).  Browse websites that review or provide
directories of similar products.  What terms do you frequently see used
to describe them?  What cultural and regional differences among your
customer base may change the language used?

Consider researching the tool's effectiveness later when you have search
data from actual users... what terms do they tend to use and checking
what sort of results the user gets, and why.  Observe the browsing
history.  If the user continues searching immediately after one set of
search terms, you can bet that the first search did not give the
expected results.





More information about the thelist mailing list