[thelist] SQL question: Will this require a sub-select?

Bill Moseley moseley at hank.org
Sun Sep 7 09:30:08 CDT 2008


On Fri, Sep 05, 2008 at 08:13:40PM -0400, r937 wrote:
> > I'm not seeing any way around the sub-select.
> 
> why would you want a way around it?
> 
> that query looks okay to me
> 
> what happened when you tested it?

It's considerably slower than without the sub-select, of course.
About four times slower, it turns out.  (The EXPLAIN shows a few
sequence scans and I'm looking into why the indexes are not being
used.)

I'm looking at ways to improve, so my sanity check first
was to see if anyone here might have a more efficient way to write it.

As always, the problem is a bit more involved.

First, in real life it's a query that runs very often (to check if
some threshold has been exceeded), but changes rarely.  Although I try
and avoid de-normalization, I might want to save the counts in a
separate table and update that table from the select below only when a
new item is inserted or updated.


Second, the hierarchy is more complex, of course.  It's more like:

    An organization
        has many accounts
            has many departments
                has many users
                    has many books
                        has many chapters


And that 20,000 I hard-coded in the example:

    SELECT
        date_trunc( 'month', book.pub_date ) as month,
        COUNT( book.id ) as books,
        SUM( chptr.word_count ) AS total_words,
        SUM( chptr.units ) AS total_units

    FROM
        book
        JOIN (
            SELECT
                book,
                SUM(word_count) AS word_count,
                ( SUM( word_count ) / 20000 ) + 1 ) AS units
            FROM
                chapter
            GROUP BY
                book
        ) AS chptr ON chptr.book = book.id

    GROUP BY
        month


has to come from the organization table, along with another number
used in the outer query.  That means both the outer select and the
sub-select must join all they way up to the organization table.  So,
it's going to be slower yet.

If the query is as optimized at it can be, then I'll have to look at a
way to cache.


-- 
Bill Moseley
moseley at hank.org
Sent from my iMutt




More information about the thelist mailing list