[thelist] Hierarchical data in the database

Matt Warden mwarden at gmail.com
Sun Jan 21 19:10:55 CST 2007


Hi Bill,

You say:

On 1/21/07, Bill Moseley <moseley at hank.org> wrote:
> The files are currently under revision control, so need to keep some
> of those features -- that is, I want to track previous versions.

But your example table has:

>     create table template (
>         id              SERIAL PRIMARY KEY,
>         path            text,     -- e.g.  path/to/templates/footer.html
>         current         boolean,  -- true if this is the current template
>         created_time    timestamp(0) with time zone not null default now(),
>         update_by       integer REFERENCES person,
>         CONSTRAINT      one_current_template UNIQUE( path, current )
>     );
>     create index template_path ON template ( path );
>     create index template_current ON template ( current );

>From what I can tell, you will still only have one version of the file
(which seems to make the current field even more meaningless). You
aren't actually storing the files in the database, just the path.
You'll have data on when previous versions were created, but not any
of the content.

> Another approach would be to have a table like this:
>
>     create table template_node (
>         id          SERIAL PRIMARY KEY
>         name        text, -- i.e. "footer.html"
>         parent_node int REFERENCES template_node,
>         is_leaf     boolean
>     );
>
> Which makes it easy to show all the templates and nodes in the "root"
> (where parent_node IS NULL), or all the templates in the foo directory
> (where parent_node = <id of foo node>).
>
> But, then how does one access /path/to/foo/footer.html efficiently?

You haven't explained all of what you need to do with these files, so
this is quite an assumption, but my guess is that you don't need all
of this. My guess is that you can get a happy medium by storing the
path and filename of each file in separate fields. The only thing this
can't represent is a folder with no files.

> I could add a "path" column to template_node, but that's
> de-normalizing the table too much for my tastes.  But, maybe that is
> the best solution?

I should learn to read ahead. table(id, fullpath) is no more
normalized than table(id, path, filename). If you're okay with your
first table structure above, then you should be fine with moving the
path out into its own field. Denormalization is not a bad thing if it
doesn't impact your requirements (including potential future ones,
which is obviously harder to tease out -- but, again, my guess is that
no requirement you ever have will require the hierarchical node
structure you've built in your second solution).

> Or maybe I could create a table for looking up by path, but I suspect
> that's the same thing as adding a "path" column.

Yes, except you can represent empty folders and update path names
*slightly* more easily.


-- 
Matt Warden
Cleveland, OH, USA
http://mattwarden.com


This email proudly and graciously contributes to entropy.



More information about the thelist mailing list