[thelist] Simple table question...

Matt Warden mwarden at gmail.com
Mon May 9 19:20:23 CDT 2005


On 5/9/05, Jonathan Dillon <jdillon at boehm-ritter.com> wrote:
> >On 5/9/05, Jonathan Dillon <jdillon at boehm-ritter.com> wrote:
> >> SELECT DISTINCT(email), firstname, lastname, phone FROM tablename;
> >>
> >> This is rather retarded, but why would this pull up duplicate emails?
> >> I just don't understand.  I thought DISTINCT would explicitly be
> >> distinct to the column name.
> >
> >What is it to do with the following data?
> >
> >joe at blow.com Joe Blow 555-555-5555
> >joe at blow.com Joe Blow 555-555-5556
> >
> >What row is it supposed to return?
> >
> >The reality is that every unique *combination* of fields listed after the
> DISTINCT keyword will be returned.
>
> Matt:
>
> Doh!
>
> Yea, that's exactly what the data is doing.  So, would changing the order
> help?
>
> SELECT firstname, lastname, phone, DISTINCT(email) FROM tablename;

Did you try it? I suspect this does not work (you may even get a
syntax error). The database still will not know what to do with
"duplicate" rows.

> How do you get around this seemingly simple problem?  I've got about 15% too
> many records!  I need records distinctly keyed off their emails... ;-)

Then you should clean out the bad data and put a unique constraint on
the email column so this can't happen in the future. Even if you were
to find a solution that gives you unique emails, your data is going to
be bad (see example I provided and assume that one of those phone
numbers is not accurate).


--
Matt Warden
Miami University
Oxford, OH, USA
http://mattwarden.com

This email proudly and graciously contributes to entropy.


More information about the thelist mailing list