[thelist] Database Schema

Hershel Robinson hershel at galleryrobinson.com
Mon May 29 05:15:49 CDT 2006


Mattias Thorslund wrote:
 > There's also "Approach three":

<snip />

 > Safest approach would be a serialized associative array, I think.

This is an interesting idea. I would actually suggest that XML would be 
an ideal data structure for such an approach. I am uncertain if this 
will actually be a better solution for my particular situation, but I 
will discuss this with my team.

Then Ken Moore wrote:
 >> I would suggest that you Google for a short tutorial on Third Normal
 >> Form and read a very basic book on SQL. Between the two, you should
 >> have a solid foundation to build on.

Thank you for your suggestion. You are correct that a short tutorial on 
3NF and a very basic book on SQL would provide a person a solid 
foundation. For more advanced applications, however, one would 
eventually strive to move beyond the basics into the realm of advanced 
database design. One will find there concepts and practices beyond the 
basics and one will find the exceptions to the rules. Most rules indeed 
have exceptions. :)

Mattias Thorslund wrote:
> Well, there are cases where experienced database designers intentionally
> ignore 3NF. Logging could very well be such a case, since it happens
> often, and must be fast in order to get out of the way quickly, leaving
> some resources for what the application is actually meant to do. Doing
> 3-5 inserts rather than 1 for each log event could mean a noticeable
> performance hit. Or not, depending on the amount of traffic, the
> hardware and other things.

With logging, there is also the issue of data retrieval performance as 
well. The system in question is expected to experience peak traffic of 
around 2000 clients logged in plus 200 internal users, all 2200 of whom 
are generating log events. So data insertion and retrieval need to be 
optimized where possible.

> So, there are no absolutes. The best choice depends on the requirements.

Well put.

That said, I have not yet heard a clear argument against my original 
Approach Two. Mattias said:

"If at any time in the
future you need to log four fields for a certain action, you'll need to
add a "field_4", and modify your SQL statements to account for it."

This in practice will not be an issue, because any query will inherently 
be fetching either none of the generic fields (for just a list of 
actions) or all of them (select *) for full data reporting. This not to 
mention that fact that one can, if one needed to, retrieve a list of 
fields from the table itself via SQL. :)

I am aware of large-scale systems in production who use my Approach Two 
for logging purposes. One database architect associated with two such 
systems told me that his team is satisfied with the performance and 
usage of this method for those systems.

Thanks,
Hershel

-- 
Gallery Robinson Web Services
http://web.galleryrobinson.com/



More information about the thelist mailing list