[thelist] sql/table design question

Wed Feb 19 06:59:01 CST 2003

hi joshua, thanks for the examples

yeah, i've seen them before   ;o)

> SELECT Book2 AS RelatedBook
> FROM table
> WHERE Book1 = 1234
>
> This is obviously going to run a lot faster assuming the same number
> of rows.  Of course, we'll have double the rows, but I'd bet it's STILL
> faster.

gut feel isn't good enough

the situation calls for a benchmark against a real table

> I understand your concerns about consistency.  Perhaps some combination
> of FK's would help ensure that the inconsistencies don't occur?

not really

i should hope you'd have FKs defined regardless, eh

no, the inconsistency occurs when one of the two rows fails to get inserted
initially, or is later deleted

like i said, you'd better have rock solid app logic

"how could i prevent a situation where one gets deleted but not the other?"
i can hear you thinking, and in a second you'll be suggesting a transaction
block (eewwww)

ah, i see jeff has an answer, similar to the one i would have suggested, had
i not been catching zeds at 4 in the morning

> SELECT DISTINCT relatedBook
>   FROM (SELECT book2 AS relatedBook
>           FROM table
>          WHERE book1 = 1234
>          UNION ALL
>         SELECT book1 AS relatedBook
>           FROM table
>          WHERE book2 = 1234)
>  ORDER BY relatedBook
>
> this has the advantage of not resulting in duplicate relatedBook records,
> should the data make that possible, which yours does not address
> (without adding DISTINCT in front of RelatedBook).

"should the data make that possible?"

well, yeah -- the double row scenario has duplicates by design!

     1234  6789
     6789  1234

a more straightforward way is to use UNION instead of UNION ALL and then the
outer select is not necessary

however, both UNION and DISTINCT require a sort of the entire result set, so
i imagine they would both perform the same way -- slower

if, on the other hand, there's only one row for each relationship, you can
use UNION ALL and avoid the sort --

    select book2 as relatedBook
      from relatedbooks
     where book1 = 1234
    union all
    select book1
      from relatedbooks
     where book2 = 1234

this should run quite well, and note that while there are two passes of the
data where the OR syntax requires only one, each subquery in the above
UNION ALL can utilize a separate index, whereas the OR syntax would
likely utilize index matching -- if you're lucky

i would still like to see benchmark results, though

and why can you avoid the sort with UNION ALL?  because in the single row
scenario, each pair is unique

the PK for the intersection table would be ( book1, book2 ), right?

so let's say that your app is in place, everything's running along smoothly,
and a user makes a relationship between book 1234 and book 6789 --

    1234  6789

in the single-row scenario, what's to stop some other user, later, from
trying to relate book 6789 to book 1234?  you might think that this would
result in the insertion of

    6789  1234

how would you prevent this?  the PK wouldn't catch it, right?  would you
have to declare a separate UNIQUE constraint on the pair of keys in the
other order?

no, the answer is actually quite simple

the app always inserts the row with book1 < book2

;o)