[thelist] Another SQL query
Bill Moseley
moseley at hank.org
Wed Sep 12 14:15:26 CDT 2007
Thanks Rudy for helping.
On Wed, Sep 12, 2007 at 12:53:40PM -0400, r937 wrote:
> > 2) how many tasks have been processed total
>
> select count(*)
> from (
> select task_parts.task_id_fk
> , count(*) as tasks
> from log
> inner
> join task_parts
> on task_parts.part_id = log.task_part_fk
> group
> by task_parts.task_id_fk
> ) as logs
> inner
> join (
> select task_parts.task_id_fk
> , count(*) as tasks
> from task_parts
> group
> by task_parts.task_id_fk
> ) as parts
> on parts.task_id_fk = logs.task_id_fk
> where logs.tasks > task.tasks
I assume that last line should be:
where logs.tasks > parts.tasks
Very cool, but that seems like it will count the number of tasks that
have been processed more than once, but not the total number of times
a task has be processed.
I'm looking for the total number of tasks that have run more than one
time.
Here's a join of the two tables:
test=> select *
from log
join task_parts ON log.task_part_fk = task_parts.part_id
order by task_parts.part_id;
log_id | task_part_fk | part_id | task_id_fk
--------+--------------+---------+------------
9 | 1 | 1 | 1
8 | 1 | 1 | 1
1 | 1 | 1 | 1
11 | 2 | 2 | 1
3 | 2 | 2 | 1
7 | 2 | 2 | 1
2 | 3 | 3 | 1
10 | 3 | 3 | 1
6 | 3 | 3 | 1
4 | 4 | 4 | 2
5 | 5 | 5 | 2
(11 rows)
You can see task #1 has three parts (1,2,3) and task #2 has two parts
(4,5).
And task #1 has been processed three times (each of task #1's parts is
logged three times) and task #2 only one time.
This is the correct count of tasks processed:
test=> select count(distinct task_parts.task_id_fk)
from log
inner
join task_parts
on task_parts.part_id = log.task_part_fk
;
count
-------
2
(1 row)
But the 2nd query shows that one task has been processed more than
once:
test=> select count(*)
from (
select task_parts.task_id_fk
, count(*) as tasks1
from log
inner
join task_parts
on task_parts.part_id = log.task_part_fk
group
by task_parts.task_id_fk
) as logs
inner
join (
select task_parts.task_id_fk
, count(*) as tasks2
from task_parts
group
by task_parts.task_id_fk
) as parts
on parts.task_id_fk = logs.task_id_fk
where logs.tasks1 > parts.tasks2
;
count
-------
1
(1 row)
Without the "where" I get 2. And without the where and select *
instead:
task_id_fk | tasks1 | task_id_fk | tasks2
------------+--------+------------+--------
2 | 2 | 2 | 2
1 | 9 | 1 | 3
(2 rows)
I wonder if there's a schema change that would support getting these
counts better. I'd rather not de-normalize the tables, but these are
counts that need to be read often but will change much less often.
Thanks again!
--
Bill Moseley
moseley at hank.org
More information about the thelist
mailing list