[thelist] Another SQL query

Wed Sep 12 14:15:26 CDT 2007

Thanks Rudy for helping.

On Wed, Sep 12, 2007 at 12:53:40PM -0400, r937 wrote:
> >  2) how many tasks have been processed total 
> 
> select count(*)
>   from (
>        select task_parts.task_id_fk
>             , count(*) as tasks
>          from log
>        inner
>          join task_parts
>            on task_parts.part_id = log.task_part_fk   
>        group
>            by task_parts.task_id_fk 
>        ) as logs
> inner
>   join (
>        select task_parts.task_id_fk
>             , count(*) as tasks
>          from task_parts
>        group
>            by task_parts.task_id_fk 
>        ) as parts
>     on parts.task_id_fk = logs.task_id_fk
>  where logs.tasks > task.tasks

I assume that last line should be:

    where logs.tasks > parts.tasks

Very cool, but that seems like it will count the number of tasks that
have been processed more than once, but not the total number of times
a task has be processed.

I'm looking for the total number of tasks that have run more than one
time.

Here's a join of the two tables:

    test=> select *
    from log
    join task_parts ON log.task_part_fk = task_parts.part_id
    order by task_parts.part_id;

     log_id | task_part_fk | part_id | task_id_fk 
    --------+--------------+---------+------------
          9 |            1 |       1 |          1
          8 |            1 |       1 |          1
          1 |            1 |       1 |          1
         11 |            2 |       2 |          1
          3 |            2 |       2 |          1
          7 |            2 |       2 |          1
          2 |            3 |       3 |          1
         10 |            3 |       3 |          1
          6 |            3 |       3 |          1
          4 |            4 |       4 |          2
          5 |            5 |       5 |          2
    (11 rows)

You can see task #1 has three parts (1,2,3) and task #2 has two parts
(4,5).

And task #1 has been processed three times (each of task #1's parts is
logged three times) and task #2 only one time.

This is the correct count of tasks processed:

    test=> select count(distinct task_parts.task_id_fk)
      from log
    inner
      join task_parts
        on task_parts.part_id = log.task_part_fk
    ;
     count 
    -------
         2
    (1 row)

But the 2nd query shows that one task has been processed more than
once:

    test=> select count(*)
      from (
           select task_parts.task_id_fk
                , count(*) as tasks1
             from log
           inner
             join task_parts
               on task_parts.part_id = log.task_part_fk
           group
               by task_parts.task_id_fk
           ) as logs
    inner
      join (
           select task_parts.task_id_fk
                , count(*) as tasks2
             from task_parts
           group
               by task_parts.task_id_fk
           ) as parts
        on parts.task_id_fk = logs.task_id_fk
     where logs.tasks1 > parts.tasks2
    ;
     count 
    -------
         1
    (1 row)

Without the "where" I get 2.  And without the where and select *
instead:

 task_id_fk | tasks1 | task_id_fk | tasks2 
------------+--------+------------+--------
          2 |      2 |          2 |      2
          1 |      9 |          1 |      3
(2 rows)

I wonder if there's a schema change that would support getting these
counts better.  I'd rather not de-normalize the tables, but these are
counts that need to be read often but will change much less often.

Thanks again!

-- 
Bill Moseley
moseley at hank.org