[thelist] Weighted Grouped Select

Joshua Olson joshua at alphashop.net
Tue Apr 24 14:08:38 CDT 2001


Howdee all,

I have a query that looks something like this:

SELECT
    obj.name AS obj_name,
    obj.id AS obj_id,
    Sum(data_type.search_weight) AS relevance,
  FROM data, obj, data_type
  WHERE obj.id = data.obj_id
    AND (Lower(data.value) LIKE '%keyword1%' OR
         Lower(data.value) LIKE '%keyword2%')
    AND data_type.id = data.data_type_id
    AND data_type.search_weight NOT IN (0, NULL)
  GROUP BY obj.id
  ORDER BY relevance DESC, obj_name

It's in a weighted search algorithm.  The data structure in general is
pretty straight forward.  I have one object record that stores all it's
pertinent data in another table.  The data in the detail table is organized
by data type, and some data types are more important than others.

This query works great for searching for keywords!  However, if both
keywords appear in one data cell, then the relevance isn't increased
accordingly.  For example, let's say the search weight for the object's name
was 2 and its color was one and have a single object called "blue ball" and
had a color of "blue".  If I search for two keywords, "blue" and "ball",
this algorithm would give a relevance of 3 (2 for the name, 1 for the
color), instead of 5 (4 points for both words in title + 1 more for color).

One answer is to use a separate query for each word in the search
parameters.  This method would work, except for the expense of combining the
results.  I'm using ColdFusion, so that would be expensive to say the least
unless I coded it in some language (Like C++) and used that native code for
processing of the results.

My preference would be to massage the SQL to accomplish approximately, if
not exactly, the same results.  Any ideas?

Did I explain this well enough for anybody to know how to tackle it?

-joshua





More information about the thelist mailing list