[thelist] MySQL oddness

Phil Turmel pturmel-webdev at turmel.org
Wed Jan 30 15:37:58 CST 2008


Hi Matt,

Matt Warden wrote:
> On 1/29/08, Phil Turmel <pturmel-webdev at turmel.org> wrote:
>>> [1]
>>> http://en.wikipedia.org/wiki/Projection_(relational_algebra)
>>>
>> Projection is merely an implementation of DISTINCT or GROUP BY.
>> If your query doesn't call for it, you might get duplicate rows.
>> If your query does call for it, which columns are involved
>> affects the results.  If your application needs projection for
>> correctness, you'd better be explicit.
> 
> Phil, I think there is some confusion here. You are correct that you
> will get duplicates if you do not include a group operation, but that
> is because the database is diverging from relational algebra
> principles in that regard. A projection is indeed an operation that
> takes a tuple (row) and re-orders and/or restricts the attributes
> (columns) contained in the relation (table). It is an operation that
> results from the *select* phrase in the SQL and NOT the group phrase.

Here I disagree.  Projection is *defined* to exclude duplicates, 
per the set notation in the reference you supplied.  Duplicate 
removal won't occur in a simple query without a DISTINCT or GROUP 
BY clause.  Column removal by itself is at least cheap, probably 
free, and in many cases negative cost.  Duplicate row removal 
might be free, but is likely to be expensive.

> I don't know the history behind the decision to diverge from
> relational algebra in this manner, but I can assume that it was likely
> a usability concern that has to do with end users being confused at
> why cardinalities were changing just because they've selected
> different columns.

I can't speak to the *entire* history of relational database 
development, but I don't think it's a question of divergence.  I 
don't think SQL and relational algebra ever *converged* as you 
describe.

Back to the points about wildcards (not all by me):  There are 
maintenance, robustness, and performance advantages to avoiding 
wildcarded column lists, particularly in distributed 
environments.  Parsing costs are negligible.

> 
> Thanks,
> 

Warm Regards,

Phil Turmel

-- 
Need to contact me offlist?
   Drop -webdev or you probably won't get through.



More information about the thelist mailing list