[thelist] Null Performance

rudy limeback r937 at interlog.com
Tue, 18 Jan 2000 09:27:06 -0500


> We are kinda thinking about approach one or two.
> Somehow my gut feeling is that number two wouldn't work

hi michiel

there are many pros and cons for designs like the ones you outlined

the pros and cons become more pronounced in large database and high access
situations

you may be tempted to have a denormalized table with repeating fields and
nulls where the customer doesn't have that asset type (your option 2), but
i *urge* you not to go there

how often do you get new asset types?  this is new millennium tech, a new
device tomorrow wouldn't surprise me

so then you'll have to change that table (and in large database and high
access situations you don't just change the table, you call a meeting with
the development team to describe the programming changes you're forcing on
them)

in addition, you have the added problem of dealing with nulls, not always a
straightforward task

option 3 looks like your best bet, if slightly modified

> A sample set of rows ( just to make it clear ) would be :
> Arguments: ID | Asset ID | Argument Name | Value
> Assets:    ID | Customer ID | Asset Type
> Customers: ID | Name | Info | Etc...

your Assets tables looks like some kind of linking table

let me yank it and add it back as the table which defines Asset independent
of the Customers

  Customers: ID | Name | Info | Etc...

  CustAssets:  ID | CustID | AssetID | Etc.

  Assets:  ID | Asset Name

CustAssets has a foreign key to each of the other two tables, which are in
a typical many-to-many relationship

each Customer has one or more CustAssets, CustAsset records do not exist
for assets that the customer does not have

each CustAsset has an AssetID which points back to the master Assets table,
which has five or six rows or however many asset types there are

unless you are producing formatted listings, you usually don't need to join
to the Assets table in simple queries

here's a good example of when it's okay *not* to use a surrogate key as the
primary key -- instead of an autonumber or sequence number for Assets, use
a two-character code as the ID, then the foreign keys in the CustAssets
table are two characters (which is less space than they would have required
for an autonumber), and as a side-benefit they are semi-meaningful, so you
can forego doing the join to the Assets table in even more queries... it's
the same argument that favours using a 2-character code as the primary key
for postal state codes

okay, all searches would probably start on the Customer table, since in
huge databases, you don't often have queries like "list all customers that
have a cable telephone" -- more realistically, you might want to find all
customers with incomes over 100K living in Lower Swansea (this allows the
query to search Customers first) who have cable telephone (and then join to
the CustAsset records only for qualified customers)

some databases allow you to tune where the child CustAsset records are
stored (i.e. physically close to the parent Customer) so that once Customer
records are chosen, no additional physical read off the database drive is
required, the CustAsset records are already in the buffer...

there are two ways to get optimum scalability

if the database already exists, you start to tweak stuff like where you
physically place records and what kind of indexes you declare -- this is
where you need a certified dba

if the database doesn't exist yet, you obtain optimum scalability by
designing in third normal form (a fifty-cent word meaning no repeating
groups, all columns functionally dependent on the entire primary key) and
then examining the sql that your most frequent queries will use

because of database indexes, performance will usually always scale well for
the "transactions" -- e.g. insert a customer, change a customer's status,
remove an asset from a customer -- because you usually supply the CustID
and the database server just pulls out the records, doesn't matter how big
the database is...

queries that have to search, or report large volumesof results, those are
the ones you analyse for performance

watch out for stuff like "who are the top salesmen in each area for the
last three quarters" -- that's when you move away from the realm of
transaction databases and into data warehousing...


i hope this all helps, i love this stuff, and i'll answer more questions on
or off list if you wish...

___________________
rudy limeback
http://www.interlog.com/~r937/
http://evolt.org/