[thelist] Search closest match across multiple columns (mysql)
Phil Turmel
pturmel-webdev at turmel.org
Wed Apr 16 10:16:46 CDT 2008
Phil Turmel wrote:
> Dave,
>
> Matt Warden wrote:
>> On Tue, Apr 15, 2008 at 5:49 PM, Dave Stevens
>> <evolt at davestevens.co.uk> wrote:
>>> So, for example, assuming my table contains this data:
>>>
>>> name height weight dateofbirth Joe
>>> Bloggs 60 50 1980-05-02 John
>>> Smith 70 65 1972-01-17 Jane
>>> Doe 45 20 1987-12-14
>>>
>>> If my user entered:
>>>
>>> height: 50 weight: 35 date of birth: 1981-11-01
>>>
>>> My application would return Jane Doe first, then Joe Bloggs and
>>> finally John Smith.
>>
>> Why? We need to know how you determine that.
>>
>> Most likely, you are doing something like:
>>
>> select name from mytable order by (height-50)*weight1 +
>> (weight-35)*weight2 + (dob-'1981-11-01')*weight3 desc
>>
>> weight1, weight2, and weight3 determine how much height, weight, and
>> dob matter in determining a match. This is something you have to
>> determine. Do you have an idea of these?
>>
>
> Matt's recommendation is clearly the *Right Answer* (tm). If you're
> scratching your head as to what weightings to start with, I suggest you use
> the reciprocal of standard deviation as your weight. This will give you a
> reasonably "natural" spread on each criteria. You'll likely need to tweak
> the dateofbirth weighting some, as birth dates are not normally distributed
> in the general population.
>
> HTH,
>
> Phil
>
>
Hmmm. Let me amend this.... You'll want absolute values on each component,
and an ascending index.
Something like this:
SELECT name
FROM mytable
ORDER BY ABS(height-50)*W1 + ABS(weight-35)*W2 + ABS(dob-'1981-11-01')*W3
LIMIT 10
You can get starting points for W1, W2, and W3 with:
SELECT 1/STD(height) AS W1,
1/STD(weight) AS W2,
1/STD(dob-'2000-01-01') AS W3
HTH,
Phil
--
Need to contact me offlist?
Drop -webdev or you probably won't get through.
More information about the thelist
mailing list