[thelist] Database Schema
Hershel Robinson
hershel at galleryrobinson.com
Mon May 29 05:15:49 CDT 2006
Mattias Thorslund wrote:
> There's also "Approach three":
<snip />
> Safest approach would be a serialized associative array, I think.
This is an interesting idea. I would actually suggest that XML would be
an ideal data structure for such an approach. I am uncertain if this
will actually be a better solution for my particular situation, but I
will discuss this with my team.
Then Ken Moore wrote:
>> I would suggest that you Google for a short tutorial on Third Normal
>> Form and read a very basic book on SQL. Between the two, you should
>> have a solid foundation to build on.
Thank you for your suggestion. You are correct that a short tutorial on
3NF and a very basic book on SQL would provide a person a solid
foundation. For more advanced applications, however, one would
eventually strive to move beyond the basics into the realm of advanced
database design. One will find there concepts and practices beyond the
basics and one will find the exceptions to the rules. Most rules indeed
have exceptions. :)
Mattias Thorslund wrote:
> Well, there are cases where experienced database designers intentionally
> ignore 3NF. Logging could very well be such a case, since it happens
> often, and must be fast in order to get out of the way quickly, leaving
> some resources for what the application is actually meant to do. Doing
> 3-5 inserts rather than 1 for each log event could mean a noticeable
> performance hit. Or not, depending on the amount of traffic, the
> hardware and other things.
With logging, there is also the issue of data retrieval performance as
well. The system in question is expected to experience peak traffic of
around 2000 clients logged in plus 200 internal users, all 2200 of whom
are generating log events. So data insertion and retrieval need to be
optimized where possible.
> So, there are no absolutes. The best choice depends on the requirements.
Well put.
That said, I have not yet heard a clear argument against my original
Approach Two. Mattias said:
"If at any time in the
future you need to log four fields for a certain action, you'll need to
add a "field_4", and modify your SQL statements to account for it."
This in practice will not be an issue, because any query will inherently
be fetching either none of the generic fields (for just a list of
actions) or all of them (select *) for full data reporting. This not to
mention that fact that one can, if one needed to, retrieve a list of
fields from the table itself via SQL. :)
I am aware of large-scale systems in production who use my Approach Two
for logging purposes. One database architect associated with two such
systems told me that his team is satisfied with the performance and
usage of this method for those systems.
Thanks,
Hershel
--
Gallery Robinson Web Services
http://web.galleryrobinson.com/
More information about the thelist
mailing list