[thelist] DB design question

rudy r937 at interlog.com
Fri Dec 6 18:16:01 CST 2002


> Isn't it gonna get pretty complex to verify uniqueness on address,
> apartment, and city for each entry? Or, once we build the constraints
> into the db, is that okay?

yes, that's the answer to the question i posed

in most cases, you *want* to declare a unique constraint on the real key

now, in this instance, property addresses, you would include as many columns
into the constraint (street number, apartment number, street name, city
name) as necessary to ensure that the unique constraint will still let you
enter all the addresses in your scope

how do databases guarantee uniqueness?

let's say you're the database, and i ask you to insert a new row, and one of
the columns has been declared with a unique constraint

what do you have to do?  check to see whether that column value already
exists

what's the easiest and fastest way to do that?  look it up in an index
(an index permits b-tree searching and other efficiency goodness)

therefore, all databases, to my knowledge, enforce uniqueness constraints by
making sure there's an index -- in other words, when you declare a unique
constraint, the database creates an index, it's almost like self-defence on
its part

"so what?" i can hear you asking

well, having an index on the address turns out to be quite beneficial, apart
from its necessary role in guaranteeing uniqueness

an index on address means that if you want to look up rows based on a WHERE
clause involving the address, chances are the database will use the index!
holy serendipitous efficiency, batman!

it's amazing, really -- i've seen people declare an autonumber primary key,
and also declare an index on some other fields, and then wonder why they
still get "duplicate entries"

"how do you delete a duplicate entry" is probably the number one frequent
question on database discussion forums

(it's not trivial, either -- because even if you know which rows are
duplicates, how do you delete all but one of them?)

anyhow, my advice is to use surrogate keys where necessary, but take a
moment to declare a unique constraint on the "real" key too

a more experienced database administrator (and i do not count myself as one)
will be able to advise what sequence to declare the multiple columns in, to
take advantage of index search efficiencies, but i have a feeling that it
might be a good idea to split out the street number from the street name,
and make the street name the high-order column

rudy




More information about the thelist mailing list