[thelist] mysql request return unique row

Matt Warden mwarden at gmail.com
Thu Sep 21 08:12:45 CDT 2006


On 9/21/06, Pierre-Henri Lavigne <lavigne at getphuture.com> wrote:
>
> Hi,
>
> For a search engine with php I send a mysql request joining multiple tables to
> get results about candidates. The fact is with specific options I can get
> multiple times the same rows. How can I do get unique candidate_id for
> examples ?

A simplified/readable version:

SELECT ca.candidate_id, cask.diploma_id
FROM candidates as ca, candidates_skills as cask,
	candidates_to_events as caev, candidates_to_states as cast,
	candidates_to_work_categories as cawc
WHERE cask.language_1_id='7'
	AND caev.event_id='2'
	AND ( cawc.work_category_id='7'
		OR cawc.work_category_id='10'
		OR cawc.work_category_id='24'
	)
	AND cast.state_id='14'
	AND ca.candidate_visibility='0'
	AND ca.candidate_id=cask.candidate_id
	AND ca.candidate_id=caev.candidate_id
	AND ca.candidate_id=cast.candidate_id
	AND ca.candidate_id=cawc.candidate_id
	AND DATE(ca.candidate_date_added)<CURRENT_DATE()

Let's say you have records for a candidate where there is 1 candidate
record, 3 candidate skills (1 of which is language=7), 2 candidate
events (1 of which is event 2), 1 canidate state (state 14), and 3
candidate work categories (which happen to be 7, 10, and 24). You are
going to get 1 candidate * 1 skill * 1 event * 1 state * 3 categories
= 3 records for that candidate, because you are joining across these
tables.

+---------------+-----------+
| Candidate Joe	| Category1 |
| Candidate Joe	| Category2 |
| Candidate Joe	| Category3 |
+---------------+-----------+

Now, because you are not selecting the category, I assume you don't
really care about it... you're just using it to filter. You can either
require a unique row by using DISTINCT (generally a poor solution, see
google, e.g.: http://www.onlamp.com/pub/a/onlamp/2004/09/30/from_clauses.html),
use a group by (equally poor solution for similar reasons), or rewrite
the query to return a single result per candidate.

The latter would involve a subquery, which hopefully your RDBMS
supports. As I believe is discussed in the onlamp article, the
optimizer often will transform your subquery version back into a join
anyway, but the point there is that the optimizer decides that is a
good idea, and still returns to you the result as if it were a
subquery (vs. discarding "duplicate" results returned with a DISTINCT
or grouping).


-- 
Matt Warden
Cleveland, OH, USA
http://mattwarden.com


This email proudly and graciously contributes to entropy.



More information about the thelist mailing list