[thelist] SQL descrete and count

Bill Moseley moseley at hank.org
Thu Jul 26 10:01:51 CDT 2007


On Thu, Jul 26, 2007 at 01:43:14AM -0400, Matt Warden wrote:
> >     SELECT DISTINCT ON (id) *
> >     FROM (
> >         SELECT *
> >         FROM foo
> >         WHERE some_col = ?
> >     ) AS sub_query;
> >
> > Which allow me to use a WHERE clause on all rows before the distinct.
> 
> Distinct is bad in most cases (this is one of them). Not sure what
> exactly is being attempted here, but if you just need a distinct set
> of IDs, try group by.

This is on Postgresql.  I actually did run the query plan for
distinct and group by, and both were very close in plan complexity and
speed.  But, that's probably due to the view which is joining 11
tables.

> If not, see the following:
> 
> http://www.onlamp.com/pub/a/onlamp/2004/09/30/from_clauses.html

Thanks, good read.  And shows how queries need to be tailored for
each use.  Not everything works equally well for every database,
obviously.

My view actually produces columns that are used from almost every
table.  There's a few joins that could be moved to the where clause,
but the trade off is the view is useful for a number of slightly
different pages in the application.  Plus, the where clause is
dynamically generated.  Plus, the query takes about 4ms on average, so
not an issue yet.


> 
> > Now, I also need a count of those rows:
> >
> >     SELECT count(*)
> >     FROM (
> >         SELECT DISTINCT ON (id) *
> >         FROM (
> >             SELECT *
> >             FROM foo
> >             WHERE some_col = ?
> >         ) AS sub_sub_query
> >     ) AS sub_query;
> >
> > Which works.  But, two nested queries makes me think I'm doing this
> > the hard way.  Is there a more natural way to do this?
> 
> count(*) is also bad (at least in Oracle, and I assume in general).
> You should not need to do this if all you are doing is getting a total
> rowcount of the result set. There should be built-in mechanisms to do
> this, either in your programming language or in the SQL dialect for
> your particular database.

I've seen comments about how bad count(*) can be -- haven't found a
good explanation yet this morning, but I'll keep looking.

Again, something that needs measuring, but the view returns quite a
bit of column data.  And some queries can return a few thousand rows.
Not very many rows, but the application displays them in pages of ten
rows each.  So the count(*) is used to get the total rows before using
LIMIT and OFFSET to fetch a page of rows.  I'm sure there's a more
efficient approach.

-- 
Bill Moseley
moseley at hank.org




More information about the thelist mailing list