[thelist] Hierarchical data in the database

Bill Moseley moseley at hank.org
Sun Jan 21 21:09:21 CST 2007

On Sun, Jan 21, 2007 at 08:10:55PM -0500, Matt Warden wrote:
> You say:
> On 1/21/07, Bill Moseley <moseley at hank.org> wrote:
> > The files are currently under revision control, so need to keep some
> > of those features -- that is, I want to track previous versions.
> But your example table has:
> >     create table template (
> >         id              SERIAL PRIMARY KEY,
> >         path            text,     -- e.g.  path/to/templates/footer.html
> >         current         boolean,  -- true if this is the current template
> >         created_time    timestamp(0) with time zone not null default now(),
> >         update_by       integer REFERENCES person,
> >         CONSTRAINT      one_current_template UNIQUE( path, current )
> >     );
> >     create index template_path ON template ( path );
> >     create index template_current ON template ( current );
> >From what I can tell, you will still only have one version of the file
> (which seems to make the current field even more meaningless). You
> aren't actually storing the files in the database, just the path.

Sorry, just left it out of my post -- it's been a long weekend:

    alter table template add column content text;

Better? ;)

> You haven't explained all of what you need to do with these files, so
> this is quite an assumption, but my guess is that you don't need all
> of this. My guess is that you can get a happy medium by storing the
> path and filename of each file in separate fields. The only thing this
> can't represent is a folder with no files.

By path do you mean directory (without the file name)?

I need two ways to access the "files" in the database.  One way is
directly by path name  path/to/template/foo/header.html, for example.
This has to be fast since a page might have twenty or so templates to
fetch to build the entire page.  This suggests a single column named
"path" holding the full path (or a (name, dir) set of columns).

I also need to access the files in the web interfaced used to edit the
templates.  I'd like to provide a file-manager type view, showing top
level "folders" that can be expanded to show their contents.  To me this
suggests some kind of hierarchy similar to what I showed.

So, my question is how best to support both of those requirements in the

> > I could add a "path" column to template_node, but that's
> > de-normalizing the table too much for my tastes.  But, maybe that is
> > the best solution?
> I should learn to read ahead. table(id, fullpath) is no more
> normalized than table(id, path, filename).

I was commenting that with my template_node example there was nothing
to prevent column "parent" pointing to "foo" but column 
"path" set to "path/to/bar/<name>" -- that is the path's parent is
different from the node's parent.  Bad data.

> If you're okay with your
> first table structure above, then you should be fine with moving the
> path out into its own field. Denormalization is not a bad thing if it
> doesn't impact your requirements (including potential future ones,
> which is obviously harder to tease out -- but, again, my guess is that
> no requirement you ever have will require the hierarchical node
> structure you've built in your second solution).

So are you saying for path/to/templates/foo/heading.html the columns
would look like this?

    name    = 'heading.html'
    dir     = 'path/to/templates/foo'

And then to show all the files in the root select "where dir IS NULL"
or for the "templates" directory 'where dir = "path/to/templates"'?
But that won't show the sub directories.  I guess I'm not following
how to do the folder-by-folder listing.

Thanks for the comments, Matt.

Bill Moseley
moseley at hank.org

More information about the thelist mailing list