On Thu, Apr 17, 2008 at 3:31 PM, Luther, Ron <Ron.Luther at hp.com> wrote: > >>SS is a measure of variability within a data set. Taking the > >>square root is just going to affect the value along this curve: > > >>So, yes, in a sense it is a distance. > > Yup, in a geometric sense. ;-PPPP > http://www.purplemath.com/modules/distform.htm > > (Halfway down the page.) It's interesting that we're both in our different mindsets on the same topic and it seems to be causing us to miss somewhat obvious things from the other mindset. You are of course right. My mind was stuck in statistics, and the "distance in a sense" I was talking about was a measure of deviation... such that x% of the data set would have a distance within the value you are calculating. After all, what you were coming up with was very close to the formula for standard deviation. If I had a few more brain cells, I perhaps would come to some amazing epiphany about how these two ideas are really the same thing mathematically. Alas... > You can try that as a first approximation. I think the problem that you run into is that > on a binary item like gender the weight pretty much needs to be infinite. So if you > are looking for a thin 5'5" tall guy who flashed kids in a park, then a fat 6'4" guy is a > closer match than a medium build 5'5" tall gal even though she is the right height > and weight. If that is true then it does not need to be in the distance calculation. We could just include that in the where clause. That would effectively segment the data set like you suggest. > Yeah, this is MUCH neater than the stuff I am actually working on today!! Amen. -- Matt Warden Cincinnati, OH, USA http://mattwarden.com This email proudly and graciously contributes to entropy.