[thelist] Search closest match across multiple columns (mysql)
Matt Warden
mwarden at gmail.com
Thu Apr 17 15:13:02 CDT 2008
On Thu, Apr 17, 2008 at 3:31 PM, Luther, Ron <Ron.Luther at hp.com> wrote:
> >>SS is a measure of variability within a data set. Taking the
> >>square root is just going to affect the value along this curve:
>
> >>So, yes, in a sense it is a distance.
>
> Yup, in a geometric sense. ;-PPPP
> http://www.purplemath.com/modules/distform.htm
>
> (Halfway down the page.)
It's interesting that we're both in our different mindsets on the same
topic and it seems to be causing us to miss somewhat obvious things
from the other mindset.
You are of course right. My mind was stuck in statistics, and the
"distance in a sense" I was talking about was a measure of
deviation... such that x% of the data set would have a distance within
the value you are calculating. After all, what you were coming up with
was very close to the formula for standard deviation.
If I had a few more brain cells, I perhaps would come to some amazing
epiphany about how these two ideas are really the same thing
mathematically. Alas...
> You can try that as a first approximation. I think the problem that you run into is that
> on a binary item like gender the weight pretty much needs to be infinite. So if you
> are looking for a thin 5'5" tall guy who flashed kids in a park, then a fat 6'4" guy is a
> closer match than a medium build 5'5" tall gal even though she is the right height
> and weight.
If that is true then it does not need to be in the distance
calculation. We could just include that in the where clause. That
would effectively segment the data set like you suggest.
> Yeah, this is MUCH neater than the stuff I am actually working on today!!
Amen.
--
Matt Warden
Cincinnati, OH, USA
http://mattwarden.com
This email proudly and graciously contributes to entropy.
More information about the thelist
mailing list