[thelist] Problem with distinct

Bill Moseley moseley at hank.org
Fri Sep 7 18:23:02 CDT 2007

On Fri, Sep 07, 2007 at 06:18:10PM -0400, Matt Warden wrote:
> On 9/7/07, Bill Moseley <moseley at hank.org> wrote:
> > > Distinct is evil
> >
> > Yes, I get caught be it often.  Not sure where the evilness is in my
> > example is, though.  Adding an extra layer as Stephen suggested does
> > seem to be a solution.
> There is no reason to use distinct in your example; that is where the
> evilness comes from. Also, using inner views or actual views is a
> solution, but again I don't see any reason to do it that way.

But, with the view I have I need some way to get "distinct" widgets.

> >  SELECT p.id, p.widget_name, max(c.color) AS color, max(s.size) AS size
> >    FROM product p
> >    JOIN color c ON c.product = p.id
> >    JOIN size s ON s.product = p.id
> >    WHERE color = 'blue'
> >    GROUP BY id, widget_name;
> >
> >  id | widget_name | color | size
> > ----+-------------+-------+-------
> >   1 | widget A    | blue  | small
> >   2 | widget B    | blue  | small
> > (2 rows)
> You definitely do not want to do this, as there is no guarantee that
> the the color and size that come up are a valid color-size
> combination. Maybe blue is only available in medium and large, for
> example. I was under the impression you only cared about the size.

The color and size are not very useful there, true.  Each widget might
have multiple colors or sizes.  That will require an extra select to
fetch each item's colors and sizes for final display.

But, the goal was to not list a widget more than once.

> Could you explain exactly the data you need out?

Sorry if my example was not very clear.

First imagine that all the attributes (columns) about a widget are in
a single table.  We want to display them a page at a time.  Then,
it's quite easy to use LIMIT and OFFSET (or whatever) and ORDER BY to
fetch a page of widgets.  The list can also be narrowed by any column,
of course.

That even works if you must do a number of joins to gather up all the
widget's attributes.

Now, if one of the joins results in multiple widgets begin returned
(e.g. a given widget is referenced by more than one row in the "color"
table) then a widget will be displayed twice, which we don't want.

So, need some way to group the widgets together.

Still need to use LIMIT to get a page at a time, and still need to be
able to sort and limit by any column.  Even a column like "color".

If limiting to WHERE color = 'red' then it's completely fine to sort
by color.  It doesn't really make sense, but as the client said: "90%
of widgets have only one color.  If a widget has two colors then it
doesn't matter which of its colors are used for sorting."

In simple terms, I want to do a join on all the tables (so I can say
WHERE color = 'red') but then throw out any duplicate widget ids --
and I don't care which are thrown out.  Then on this final table be
able to use LIMIT and ORDER BY.

Make sense?

Bill Moseley
moseley at hank.org

More information about the thelist mailing list