[thesite] rudy, this may intersest you..
rudy
r937 at interlog.com
Fri Oct 5 12:46:54 CDT 2001
you guys kill me
it takes a lot longer than 21 days, eh
i still haven't got it right ;o)
> select who, lastlogin, createdate
> from users
> group by email
> having count(*) > 1
>
> is giving me a "Not a group by" expression
as well it should
i understand what you want, though
> I don't think that'll work because you cannot
> return this sort of row level detail when you have
> a group by in there.
exactly, joshua
think about what happens with a group by -- the rows
in the group are collapsed, preserving the values of
the group by columns
> select who, lastlogin, createdate
> from users
> where exists
> (select 1 from users u2
> where u2.email = users.email
> and u2.userid != user.userid)
>
> Or something like that.
indeed, that's pretty close -- in fact it does give you the
right answer, except that it includes every user twice!
(he said, confidently, not having bothered to test it)
> something like:
>
> select email, who, lastlogin, createdate
> from users
> group by email, who, lastlogin, createdate
> having count(email) > 1
good try scott, but this will probably come up empty -- you're looking for
groups where they have at least two rows with the same email and who and
lastlogin and createdate
and way to go, hinting at null emails... as if group by wasn't tricky
enough, now you gotta bring up the issue of nulls?
i love nulls, but they are really hard for a lot of people to grasp
> I do not think this will work either because the GROUP BY
> forces all rows that have the same email to be reduced down
> into one row.
exactly -- i say collapsed, you say reduced ;o)
key point: the database gives you one row per group
the only actual column values you can access are the ones that are
guaranteed to be the same for every row in the group -- and that
would be the columns in the group by column list
you can also access column functions for the group
column functions are most often called aggregate functions, as they operate
on a set of values -- the column of values for the rows of the group
(scalar functions like substring, month, cast, and so on, operate on a
single, scalar value in a single row)
so definitely you want to group by email
the trick is to make the grouping a subquery, similar to where joshua was
going, but using IN instead of EXISTS
select email, who, lastlogin, createdate
from users
where email in
(select email
from users
group by email
having count(*) > 1)
the subquery returns all the duplicated emails, while the outer query
returns all the rows that have those emails
a good optimizer will do it in a single pass using a join, but don't you
try to write it as a join, you'll only hurt yourself
i think count(*) is faster, but count(email) -- and i'd want to test this
to make sure -- produces the same result, only it never returns a null
group
if there were tons of null emails in the table (and in our case there
aren't), this might be a good idea, as you wouldn't want the outer query to
return all the users without an email (assuming there was more than one),
but it would probably not occur to me to write count(email), i'd probably
put a filter in the subquery --
select email, who, lastlogin, createdate
from users
where email in
(select email
from users where email is not null
group by email
having count(*) > 1)
rudy
More information about the thesite
mailing list