[thelist] Architecture for arbitrary data management

Joe Flintham list at menticulture.com
Fri Feb 29 07:39:05 CST 2008


Hi Evolt

I'm looking for some advice on the following problem I'm trying to 
solve.  I'm building a system which will store data created by users, 
which is normally a trivial thing to achieve with, say, an RDBMS.

The thing is, the data I need to store will be fairly arbitrary: one 
user may submit a set of data such as:

id: 1
title: "Hello world"
tag: "foo bar"

while another user may submit data such as:

id: 2
maplocation: "/maps/image.jpg"
latitude: 12.345
longitude: 12.345
scale: 0.001
tag: "foo bar"

Basically, I'm trying to design a system which can cope with storing a 
fairly arbitrary set of key:value pairs, but which can quickly retrieve 
subsets of them based on complex queries as in any RDBMS.

I've worked through a number of possible solutions (flat XML files, one 
big SQL db, many small SQL dbs, combinations of all these, etc) and all 
of them have trade-offs. XML files are perfect for storing such 
arbitrary data, but performing queries on thousands of them will 
probably be prohibitive.  SQL databases can do the fast queries, but I'm 
not sure how I can effectively normalise the arbitrary data, without 
ending up with an arbitrary number of tables which then need to be 
joined.  If I don't normalise the data, then an SQL db is probably the 
wrong solution.

I've also looked at CouchDB [1] which is designed very much for this 
kind of thing, but I'm specifically looking to work with the commonly 
available services on a LAMP stack.
Any thoughts on this problem much appreciated :)

Thanks

Joe

[1] http://www.couchdbwiki.com/index.php?title=CouchDb_Quick_Overview




More information about the thelist mailing list