[thelist] Problem with distinct

Bill Moseley moseley at hank.org
Sat Sep 8 08:50:52 CDT 2007


On Sat, Sep 08, 2007 at 07:29:39AM -0400, r937 wrote:
> bill, there is no getting around the fact that if you are going to mush up
> your data (e.g. print "red" when in fact it comes in red and blue), you are
> going to have to live with unexpected and unintended consequences
> 
> a word to the wise is sufficient
> 
> if a widget comes in multiple sizes and colours, and you wish to show only
> one row for each widget, my advice is to go ahead and show only one row per
> widget, ~collapsing~ the multiplicity, ~not~ misrepresenting it (e.g. by
> showing only one colour when it has two)

I hope I was clear.  I'm not showing one color where there may be
many.  I'm doing another select to fetch *all* the colors once I have
the list of unique widgets.

The issue I probably under emphasized is that my search results come
in pages of, say, 20 results per page.  If my join results in multiple
widgets (because one has multiple colors) but I want to display a
widget only once then my OFFSET and LIMIT might be off (since I might
need to throw away a duplicate widget row).  Plus, if sorted by
something other than widget then the same widget might be on different
pages.

So, the challenge was to still be able to say "WHERE color = 'red'" to
limit by those rows, and then collapse that join into distinct widgets
*then* do my OFFSET and LIMIT to get correct paging.

> if you're using MySQL, you are halfway home -- use the GROUP_CONCAT function
> 
> with any other database, good luck getting the same effect as the
> GROUP_CONCAT function

I use Postgresql.  It does not have GROUP_CONCAT, but probably a more
flexible feature:  User-defined aggregates.

    http://www.postgresql.org/docs/current/static/xaggr.html
    http://www.postgresql.org/docs/current/static/sql-createaggregate.html

But, I'm using an ORM that doesn't support that currently.  It does
support calling a color method on the widget object to get all the
associated colors for the widget.  That's handy, but not very
efficient.

I should look into using the user-defined aggregates in this case,
although I'm not clear if I could still use WHERE to limit by, say,
color.


-- 
Bill Moseley
moseley at hank.org




More information about the thelist mailing list