[thelist] denormaliation is not a dirty word

rudy r937 at interlog.com
Thu Oct 17 18:45:02 CDT 2002


some comments on remarks by steve and simon

nothing critical, just some observations...

> ... I've solved the problem in two parts using views in MS SQL2K.
> The first view links together various values in relational tables and
> the second collects total values of sales from the first view for each
> type of item available.

views are seriously underrated as a design strategy

good database design anticipates which tables get used together

views make development easier, too

> ... in the database there is so much that is going on "behind
> the scenes" that I find it very difficult to gain a feeling for which
> solution is best.

there is more going on behind the scenes in database performance than any
one person can hope to understand in a lifetime, and by then it would be
too late, because the stuff you knew only a few years ago is really not
worth much today, even assuming you could remember it...

> I can time queries, but beyond that I always find I have a
> sneaking feeling that I could have done it better.

i know that feeling all too well, although i don't let it bother me   ;o)

if you know how to time them, you're on your way

if you can understand EXPLAIN output, you're cookin'

> What this leads me to is whether it is always better to use
> the database, or whether the advantage of clearly-read
> procedural code is worth a performance hit (within certain bounds).

apples and horses

let us try to disentangle performance from maintainability

will the database always outperform procedural code?

not always, but the rule of thumb is yes

will the database always be more maintainable than procedural code?

not always, but again, the rule of thumb is yes

oh my goodness, yes

the expression "clearly-read" has nuances which i would love to discuss one
day, but let us put maintainability aside, and assume (ha!) that it's just
as easy (chortle!) to change thousands of lines of code (is my bias
showing?), as it is to change some table definitions and query/view
statements...


> Now the client says that they would like to see the list of
> people working on that project as well as the other information,
> and here's where the problem arises.

the problem, as we shall see, is not a database problem at all, it is a
problem of application design and how to make a change that does not blow
your application up

> For each project you can have many people working on it, and so
> we have a one-to-many relationship. A left join in this situation is no
> use as for each project / row returned, you would end up with as many
> rows as you had people working on a project.

exactly, and your description is excellent

that is exactly what the left join does, it lists people by project

is a left join the answer here?  yes

three separate queries would do the job, too -- one for projects, one for
people, and one for the projects that people are on

three queries in succession, rudy?  are you nuts?  no, because the total
size of three record sets might come across the network much faster than
a record set produced by a join, because -- deep breath -- there are
redundant values in the "one" columns for the various sets of "many" rows,
like the project name, project description, start and end dates, la de dah,
repeated on all the people rows that are in each project

> So, SQL Server now had to find a list of the people working on
> each project and then concatenate them all in a string so that this
> list of people could be viewed along with the other project information

that's a common solution, a simple denormalization

comma-separated list is the most frequent way, too

picture a row with project name, description, start and end dates, la de
dah, and then tacked onto the end, a list of employee initials

you'd be surprised (or maybe not) how often this comes up on the various
database forums i hang out on

> this was achieved by running a function from the select statement.
> Now this ran like an absolute dog, for the simple reason that databases
> are meant for storing and returning information, *NOT* for concatenating
> strings together.

yeah, but databases concatenate real well too!  it's like gravy!

i wouldn't have used a function, just a simple stored procedure

> Unfortunately for this client, their decision to put all the application
> *AND* display logic together into this COM object meant that nothing
> could really be done without a complete re-write of the application.

ah, the infamous com object

well, serves you right

> This example has only served to reinforce my opinion that a database
> abstraction layer is an important *thin* layer to implement on any major
> project -- this layer would have handled the above example in the
> language of the application, whatever that may be -- PHP, ASP, CF, etc

no -- stored proc, stored proc, stored proc

then you wouldn't have to rewrite the abstraction layer when you switch
from PHP to ASP to CF (or wish to develop an app that will run with any
language)

> you can migrate to another database easily enough
> by just changing the database abstraction layer.

in general, i agree with this, but the need for it is dubious

consider what you might put into a database abstraction layer

for example, everyone who writes queries eventually realizes that "standard
sql" is an oxymoron

consider just the syntax differences between microsoft access and
sql/server

but syntax differences are not what you should abstract

what you should abstract is

  - select projects
  - select people
  - select projects plus their people
  - select people plus their projects

now in the case we were discussing earlier, the "function" which was
written into the select statement to denormalize the people for each
project, i.e. to deliver one row per project, with comma-separated people,
that's clearly a variant of the third query above -- a minor variant, in
that it denormalizes the result set of a left join

in a very broad sense of the word, it is a "view"

it is -- drumroll, please -- emphatically *not* application logic

it is database logic -- in a broad sense, a "view" -- and if standard sql
had a way of giving it to you, you'd be all over it

by comparison, if your database has CUBE and ROLLUP extensions, you'd be
nuts to write the same functionality yourself (like we used to have to do,
back in the database dark ages)

so the "function" involves denormalization -- so what?

hey, denormalization is not a dirty word, it has real, practical usage

but before i get too far off on a tangent, let me say simply that the
"database abstraction" in this example should be moved closer to the
database

> There is a slight performance penalty perhaps with this
> additional layer, but it keeps code out of the database,
> which I think is important for maintainability and readability

sorry, got to disagree

performing the denormalization on the application side of the
application-database interface is a very poor solution to the problem of
understandability

you are paying (with dubious results, i might suggest) a performance
penalty for questionable maintainability

if you've read this far, and you understand the one-to-many and
denormalization concepts, then the only thing you have left to learn to
become an sql guru, is when to use a query, the raw child, when to use a
view, the adolescent, and when to use a stored procedure, the lover, the
mysterious, the all-powerful, the inescapable...

> I'm not a big fan of stored procedures as you can imagine! ;)

probably because all your experiences were with wanton application hussies
masquerading as decent database procedures!!

remember, don't put application logic into stored procs, just database
logic!!


rudy







More information about the thelist mailing list