[thelist] Incorrect SQL?

Sat Apr 6 20:16:01 CST 2002

> If the column in the GROUP BY is the PK, and the non-aggregate
> fields in the SELECT clause are from the same table, then you *should*
> be able to select them with aggregate functions, right?

i understand what you're saying

if "pk" is the primary key for tablea, and assuming a 1-to-many
relationship to tableb, the following should return the number of related
tableb rows in the count --

      select a.pk, a.xx, a.yy, count(*)
         from tablea a
               , tableb b
       where a.pk = b.fk
    group by a.pk

> If the GROUP BY is on the PK, then every row is returned anyways,
> so returning detail should not matter!  Am I just totally off-base?
> Does this logic vary from db to db?

you're saying the presence of the xx and yy columns in the select list
would not materially change which rows are in each group, and you're right

however, other than mysql[1], no other database (that i know of) allows
this, and would flag it as an error

the smart thing is simply always to code it as

  group by a.pk, a.xx, a.yy

because this works in all databases

yes, i know it's a pain when the select list contains expressions such as

  DATE_FORMAT(entryDate,'%d-%m-$Y') as entryDate

but that's why views were invented   ;o)

>> count(*), joshua, counts rows in each group
>
> From a purely intellectual standpoint, I don't like this shortcut
> because it just doesn't accurately describe what is happening.
> It seems to be sort of a hack.

i'm sorry, i don't understand why you're calling this a shortcut or hack

count(*) is defined as the number of rows

whatever could it be a shortcut or hack for?  what do you think is
happening?

rudy

[1] here's the mysql doc on the subject --

http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html

    MySQL has extended the use of GROUP BY.
    You can use columns or calculations in
    the SELECT expressions that don't appear
    in the GROUP BY part. This stands for any
    possible value for this group. You can
    use this to get better performance by
    avoiding sorting and grouping on unnecessary
    items. For example, you don't need to group
    on customer.name in the following query:

    mysql> select order.custid,customer.name,max(payments)
        ->        from order,customer
        ->        where order.custid = customer.custid
        ->        GROUP BY order.custid;

    In ANSI SQL, you would have to add customer.name
    to the GROUP BY clause. In MySQL, the name is
    redundant if you don't run in ANSI mode.

    DON'T USE THIS FEATURE if the columns you omit
    from the GROUP BY part aren't unique in the group!
    You will get unpredictable results.

the caps in the last paragraph are bold in the docs -- heavy emphasis