[thesite] database: content table

Sat May 26 00:07:30 CDT 2001

> I love steadfast rules like the one
> you are implying in that first sentence.

thanks, joshua

not sure i can do the subject justice in one post (database books usually
devote an entire chapter to group by and having) but here goes...

suppose you have a database of city data

create table census
  (cityname  char(50)
  , county  char(2) references countytable
  , state  char(2) references statetable
  , men  integer
  , women integer
  , dogs integer
  , cats integer)

each city has foreign keys to its state and county, plus some demographic
data

there's one row per city

now let's look at a query

  select county, count(*)
      from census
   group by county

you get one row per group, i.e. one per county, and it contains the number
of cities -- actually rows, but there's one row per city in the table -- in
each county

try another

  select state, sum(dogs)
   from census
     group by state

again, one row per group, i.e. one row per state

now it gets trickier

  select  state, county, avg(men), avg(women)
    from census
      group by state, county

notice that the columns in the group by are also in the select list

you get one row per county per state, and the averages in this case are
city averages

the rows in the result set produced by a group by operation are similar to
the "subtotals" rows you get in report writer programs if you turn "details
off" (if you know what i mean)

it's kinda like you did a select distinct on the group by columns, to get
all the different combinations, but at the same time, producing subtotals
(counts, sums averages, etc.) for each distinct group

and don't forget the having clause, it can be quite useful too  --

  select cityname, count(state)
     from census
    group by cityname
     having count(state)>1

finds all citynames that exist in more than one state (tricky, but it's a
byproduct of the one-to-many relationship from state to city, nobody said
city names had to be unique)

what about this --

   select sum(dogs), sum(cats)
     from census

where's the group by??  don't really need one!!

the entire table is the group, so you only get one row back

see, any time you have a group by, the columns in the group by have to be
in the select list too

  select state, sum(cats)
    from census

bzzzzt,  this gives an error, you can't have an aggregate function without
the group by if there's a plain old ordinary column in the select list

you'd think that the database would be smart enough to just plug in the
group by columns whenever you use an aggregate function in the select list,
but noooooo....

finally, one caveat -- MySQL is a bit wacky when it comes to groups

7.4.13  Functions for Use with GROUP BY Clauses
http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#Gr
oup_by_functions">

   MySQL has extended the use of GROUP BY. You can use
   columns or calculations in the SELECT expressions that don't
   appear in the GROUP BY part. This stands for any possible value
   for this group. You can use this to get better performance by
   avoiding sorting and grouping on unnecessary items. For example,
   you don't need to group on customer.name in the following query:

   mysql> select order.custid,customer.name,max(payments)
         from order,customer
         where order.custid = customer.custid
         GROUP BY order.custid;

   In ANSI SQL, you would have to add customer.name to the
   GROUP BY clause. In MySQL, the name is redundant if you
   don't run in ANSI mode.

   <STRONG>Don't use this feature</STRONG> if the columns
   you omit from the GROUP BY part aren't unique in the group!
   You will get unpredictable results.

i personally have not had a chance to play with this wackiness, and i'm not
eager to...

helps?

rudy