[thelist] SQL question: Will this require a sub-select?
Bill Moseley
moseley at hank.org
Sun Sep 7 09:30:08 CDT 2008
On Fri, Sep 05, 2008 at 08:13:40PM -0400, r937 wrote:
> > I'm not seeing any way around the sub-select.
>
> why would you want a way around it?
>
> that query looks okay to me
>
> what happened when you tested it?
It's considerably slower than without the sub-select, of course.
About four times slower, it turns out. (The EXPLAIN shows a few
sequence scans and I'm looking into why the indexes are not being
used.)
I'm looking at ways to improve, so my sanity check first
was to see if anyone here might have a more efficient way to write it.
As always, the problem is a bit more involved.
First, in real life it's a query that runs very often (to check if
some threshold has been exceeded), but changes rarely. Although I try
and avoid de-normalization, I might want to save the counts in a
separate table and update that table from the select below only when a
new item is inserted or updated.
Second, the hierarchy is more complex, of course. It's more like:
An organization
has many accounts
has many departments
has many users
has many books
has many chapters
And that 20,000 I hard-coded in the example:
SELECT
date_trunc( 'month', book.pub_date ) as month,
COUNT( book.id ) as books,
SUM( chptr.word_count ) AS total_words,
SUM( chptr.units ) AS total_units
FROM
book
JOIN (
SELECT
book,
SUM(word_count) AS word_count,
( SUM( word_count ) / 20000 ) + 1 ) AS units
FROM
chapter
GROUP BY
book
) AS chptr ON chptr.book = book.id
GROUP BY
month
has to come from the organization table, along with another number
used in the outer query. That means both the outer select and the
sub-select must join all they way up to the organization table. So,
it's going to be slower yet.
If the query is as optimized at it can be, then I'll have to look at a
way to cache.
--
Bill Moseley
moseley at hank.org
Sent from my iMutt
More information about the thelist
mailing list