[thelist] Database Schema

Mattias Thorslund mattias at thorslund.us
Sun May 28 14:26:32 CDT 2006


No recommendations one way or the other, but here's my analysis.

"Approach one" means more work in implementing the log. Adding one more
"field" should be fairly easy as long as your code is written to take
that into account. Log analysis could be done in multiple ways, quite
powerfully. But writing a SQL statement that adds each "field" as a
separate column in the output, and avoiding row duplication would take
an exercise in sub-queries or CASE statements and the like.

I have a serious aversion against table structures that necessitate
field names like "field_1", "field_2", etc.  If at any time in the
future you need to log four fields for a certain action, you'll need to
add a "field_4", and modify your SQL statements to account for it. A
variation on this is to include all the required fields with their real
names, leaving them blank when they don't apply. Using variable-width
data types would make the extra space needed pretty much negligible.

There's also "Approach three":

The activity table has fields:
id
action_id (foreign key to actions table)
user_id (foreign key to users table)
date_time
data (serialized data that describes the specifics of the action)


I would not recommend that for anything else but logging, however. Also, it will limit you when analyzing the data in the "data" column. But it is flexible, in the sense that adding a new thing to log in "data" is easy and does not require any table modifications. Watch out if the format in the serialized data changes, it could invalidate the old data. Safest approach would be a serialized associative array, I think.



Hershel Robinson wrote:
> I am designing a large database schema for a web app. The app manages 
> the business activities of a direct marketing web retailer. The issue in 
> question is how to store a log of all activities that occur in the 
> system. Actions we store include:
>
>   - client logs in to the system
>   - client adds a product to his cart
>   - client edits the details of product
>   - client logs out
>   - a Sales Agent sends an email
>   - a Sales Agent sends a fax
>   - a Sales Agent makes a phone call to a prospective client
>   - a Sales Manager assigns a prospective client to a certain Sales Agent
>
> There are many more actions, some of which are unique to the particular 
> business in question. The question is which of the these two approaches 
> for the DB table structure:
>
> APPROACH ONE:
>
> The activity table has fields:
> id
> action_id (foreign key to actions table)
> user_id (foreign key to users table)
> date_time
>
> There is then a table called activity_fields with fields:
> id
> activity_id (foreign key to activity table)
> field_name
> field_value
>
> The actual activity (a login or phone call) is a record in the activity 
> table. Each datum which needs to be stored with that action will be 
> stored in the activity_fields table. Thus the phone number which was 
> called or the email address to which an email was sent or the product id 
> and quantity of a product added to a cart will be stored as field_name 
> field_value pairs--each datum becoming one record in the activity_fields 
> table.
>
> APPROACH TWO:
>
> The activity table has fields:
> id
> action_id (foreign key to actions table)
> user_id (foreign key to users table)
> date_time
> field_1
> field_2
> field_3
>
> The actual activity (login or phone call) is again a record in the 
> activity table, but now it has with it as well as any relevant 
> details--each one being stored in one of the generic fields, field_1, 
> field_2, etc. The mapping of what each field means for each action is 
> stored in the actions table. So for a phone call, the phone number would 
> be in field_1, for an email the email in field_1, for a client adding a 
> product to a cart, the product id would be in field_1 and the amount 
> added in field_2. Should we find that we need more fields, we can always 
> add more in the future.
>
> Those are the two approaches we have come up with.
>
> I have my personal opinion on this subject, but I would like to hear 
> input from Evolters, if anyone is interested. :)
>
> Thanks,
> Hershel
>
>   




More information about the thelist mailing list