[thelist] Truth about Outer Joins vs. Union All in Oracle?

Phil Turmel pturmel-webdev at turmel.org
Fri Nov 16 10:42:34 CST 2007


Matt Warden wrote:
> listers:
> 
> There are rumors that outer joins should be avoided in Oracle in favor
> of a semantically equivalent union all. I have sort of taken this as
> truth, but the more I think about it the more I am not convinced. I
> have done some googling, but have not found what I'm looking for.
> 
> The following are two versions of the same query, one with a left
> outer join and the other using a union all. After each query is the
> execution plan chosen by the cost-based optimizer.
> 
> Could someone with better experience reading execution plans confirm
> that the union all should perform better in general (and an
> explanation why, if possible)?
> 
> Probably want to view this in fixed font...
> 
> select nbr_ssn
> from data_exch.t_exch_data_header d
> left join data_exch.t_exch_match_header m
>             on d.idn_header_data_exch = m.idn_header_data_exch;
> 
> Operation                Name                   Rows    Bytes   Cost
>  CPU Cost   IO Cost
> SELECT STATEMENT                                35M     640M    113506
>  16G        112293
>  HASH JOIN RIGHT OUTER                          35M     640M    113506
>  16G        112293
>   INDEX FAST FULL SCAN   PK_T_EXCH_MATCH_HEADER 1752K   10M     1722
>  251M       1703
>   PARTITION HASH ALL                            35M     438M    61548
>  8G         60932
>     TABLE ACCESS FULL    T_EXCH_DATA_HEADER     35M     438M    61548
>  8G         60932
> 
> 
> select nbr_ssn
> from data_exch.t_exch_data_header d
> inner join data_exch.t_exch_match_header m
>             on d.idn_header_data_exch = m.idn_header_data_exch
> union all
> select nbr_ssn
> from data_exch.t_exch_data_header d
> where not exists (
>             select *
>             from data_exch.t_exch_match_header m
>             where d.idn_header_data_exch = m.idn_header_data_exch
> );
> 
> Operation                  Name                 Rows    Bytes
> Cost	CPU Cost   IO Cost
> SELECT STATEMENT                                35M     641M    227011
>  16G        112293
>   UNION-ALL
>    HASH JOIN                                    1752K   31M     113506
>  16G        112293
>     INDEX FAST FULL SCAN   PK_T_EXCH_MATCH_HEAD 1752K   10M     1722
>  251M       1703
>     PARTITION HASH ALL                          35M     438M    61548
>  8G         60932
>      TABLE ACCESS FULL     T_EXCH_DATA_HEADER   35M     438M    61548
>  8G         60932
>    HASH JOIN RIGHT ANTI                         33M     609M    113506
>  16G        112293
>      INDEX FAST FULL SCAN  PK_T_EXCH_MATCH_HEAD 1752K   10M     1722
>  251M       1703
>      PARTITION HASH ALL                         35M     438M    61548
>  8G         60932
>       TABLE ACCESS FULL    T_EXCH_DATA_HEADER   35M     438M    61548
>  8G         60932
> 
> 
> All I can really take out of this is that the I/O cost of the second
> is about twice the first (I assume because of the second table scan of
> the table with  > 30 million records in it!). But I'm sure I'm only
> considering one small piece of the puzzle. Any thoughts?
> 
> Is this a special case where the outer join is better? What about the
> general case?
> 
> Thanks!
> 
Hi Matt,

I don't see in the execution plan any indication that your join 
column is indexed (idn_header_data_exch) in the data table 
(idn_header_data_exch).  (?!?!) Unless I've misread something, 
that forces a full table scan in both halves of the union.

FWIW, I don't bother to convert outer joins to unions, ever 
(yet).  The loss of maintainability is just too expensive.  If I 
ever run into a tough case that absolutely needs this, I'd 
arrange for the FK column to be NULL when appropriate, and scan 
on that in the single-table half of the union.

Phil
-- 
Need to contact me offlist?
   Drop -webdev or you probably won't get through.



More information about the thelist mailing list