[thelist] Problem with distinct

Bill Moseley moseley at hank.org
Fri Sep 7 16:04:37 CDT 2007


This has come back to haunt me again.

I'm selling widgets, and each widget can have more than one size and
sometimes more than one color.

So, as you would expect, a list of widgets will include the
same widget more than once -- for each color and size.

We have a view that joins the tables, which is helpful:

    test=> select * from product_view;
     id | widget_name | color |  size  
    ----+-------------+-------+--------
      1 | widget A    | red   | large
      1 | widget A    | red   | medium
      1 | widget A    | red   | small
      1 | widget A    | blue  | large
      1 | widget A    | blue  | medium
      1 | widget A    | blue  | small
      2 | widget B    | red   | large
      2 | widget B    | red   | medium
      2 | widget B    | red   | small
      2 | widget B    | blue  | large
      2 | widget B    | blue  | medium
      2 | widget B    | blue  | small
    (12 rows)

Now, the discussion goes like this:

Client:  "I want a list of widgets, and I want people to be able to
select just "red" widgets, but I don't want to list a widget more
than once".


Well, you can't do this, of course, because you get the same widget
more than once due to the multiple sizes:

    test=> select * from product_view where color = 'red';
     id | widget_name | color |  size  
    ----+-------------+-------+--------
      1 | widget A    | red   | small
      1 | widget A    | red   | medium
      1 | widget A    | red   | large
      2 | widget B    | red   | small
      2 | widget B    | red   | medium
      2 | widget B    | red   | large
    (6 rows)


Ok, so this is probably the way to go:

    SELECT * FROM product WHERE product.id IN
        (
            SELECT id FROM product_view WHERE color = 'red'
        )
    ORDER BY widget_name;

     id | widget_name 
    ----+-------------
      1 | widget A
      2 | widget B
    (2 rows)


But here's the rub:

Client says:  "I want to be able to sort the widget list by its color".

Developer: "But you can't do that because a widget might have more
than one color.  Which color would would you sort by?"

Client: "90% of our widgets only have one color, so sort by that color.  For
the very few that have more than one color I don't care which one is used
for sorting."


Hum.  Don't like using DISTINCT, but give it a try.  It would be handy
to use that VIEW "product_view", after all, since there's a color
column to sort by:

    test=> select distinct on (id) * from product_view;
     id | widget_name | color | size  
    ----+-------------+-------+-------
      1 | widget A    | red   | large
      2 | widget B    | red   | large
    (2 rows)

Cool!  But wait:

    test=> select distinct on (id) * from product_view order by color;
    ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Darn, that's not very helpful.  What else to do?






-- 
Bill Moseley
moseley at hank.org




More information about the thelist mailing list