[thelist] metadata XML idea (longish)

PeterV peter at poorbuthappy.com
Thu Apr 11 14:15:01 CDT 2002


Hi all,
I've been working on this idea, but its too big to just implement myself
(I'm not that good a coder), so thought I'd throw this in the group.

The idea: a way of keeping metadata, that allows for:
* Auto-generated navigation that can link to similar articles, even on
OTHER websites (that publish the same XML format metadatamap) (ie. See
other articles about Perl on these the following websites:...)
* A standard for auto-generating almost all navigation on a site, including
alternate search terms. (ie. Did you mean "Perl"?)

I have been playing around with this idea for software to manage faceted
metadata. I'm not a software
developer (although I've written a CMS or two), so I won't be implementing
all of this probably...

Faceted metadata is used to generate navigation like "More about this
topic", or: "Related articles". Current faceted metadata systems are
proprietary and usually not very powerful or flexible. I propose to
develop an open XML standard notation (already half finished), software to
manage metadata (which could be sold?) and software to generate navigation
(which could be in the form of free libraries). Initial trials of similar but
simpler systems on a site of mine have proven pretty successful. It
seems to work for other sites as well.

My idea is based on the ISO topic maps standard (topicmaps.org), but a
lot simpler. The cool bit: you can PUBLISH your metadata, IMPORT metadata or
MERGE your metadata with other metadata based on merge-rules.

This gives a lot of power, yet doesn't have to be as complex as XTM (the
XML language to represent topicmaps, which is being adopted very slowly,
if at all, because of its complexity).

Here's a use scenario:
- I have a website about clothes with too many content pages, the simple
navigation isn't sufficient enough anymore.
- I get the software, I generate a "topic map" of my content (with types
of clothes, colors, styles, ...)
- I put server side includes in my pages (generated by the software)
that generate additional navigation

Another scenario:
- I have a number of weblogs, but I want more metadata power than just
categories - I want to link between them as well.
- I get the software and write a topic map for each topic.
- then I go to other weblogs, look at their published topic maps, and
take some topics from them and IMPORT and MERGE them into my map
- I then can generate navigation like: "More about this topic (XML
standards) on other weblogs: weblogA, weblogB)"

So its not a content management system, it's the metadata system (which
on most cms's is pretty bad) that can work with any cms.
You can import topic maps, or parts of them (for example, you could
import a facet (published somewhere else) with all the countries, so you
wouldn't have to write that yourself.) Or you could import an entire topic map
published by someone else, and use that. You could adapt it, yet keep
links between PUBLISHED TOPICS, so the navigation can auto-generate links to
other sites.

The entire concept is based on XTM (the topicmap standard), but
simplified, and adapted for the web. The XTM standard is too complex and
won't be
widely adapted for years. So I think there is room for a simpler (but
compatible) standard to use right now.

Another way of looking at the concept is that it's a standard way to
manage an extremely advanced thesaurus/controlled vocabulary for your website.

Finally: one more scenario:
- I type in a word in the search box, the software recognizes this as a
synonym of a term in the vocabulary, and shows me all pages, including
the most important page highlighted.

And a rough sketch of the XML language:
<topicmap>
# facets
<facet name="countries" id="f1" />
<facet name="things to do in Colombia" id="f2" />
# association types: occurences of topics are of a certain type: a
picture,
a discussion, ...
<association name="pictures" id="a1">
<association name="discussion" id="a2">
<association name="article" id="a3">
# topics
...
<topic name="nightdiving in Colombia" facet="f1" id="t2">
<searchalternative type="misspelled" name="diving in Columbia" />
<searchalternative type="misspelled" name="diving in colombia" />
<searchalternative type="synonym" name="scubadiving" />
<searchalternative type="synonym" name="scuba diving" />
<parent id="t1" /> # where t1 is the topic "diving in colombia"
<publishedID url="http://colombia.com/map.xlm#t2"> # this published id
could be another topic on another, more authorative map, thereby
defining
this topic as identical to another published topic, makes merging easier
</topic>
# occurrences: for each URL have occurrences
<occurrence>
<url value="http://colombia.com/discussion/184" />
<name value="Discussion on diving" />
<author value="au1" />
<topicid id="t1" association="a1" /> # pictures about diving in
Colombia
<topicid id="t1" association="a2" /> # discussion about diving in
Colombia
<topicid id="t4" association="a2" /> # discussion about topic 4
</occurrence>
# authors (can be websites, ...)
<author id="au1" name="Colombia official website">
<definition type="URL" value="http://officialcolombia.com">
</author>
# merge-rules
# still working on this part, based on published ID in topics. I'm not
even
sure we need merge rules...
</topicmap>

The above obviously needs work, and it can probably be simplified... ;)
It basically is XTM, but with constraints and predefined things like topic
types (=facets), so that writing software for it becomes a lot simpler,
and focussed on webuse, and so it hopefully gets picked up by people.

For anyone who is still with me after this really long post, and if it made
any sense at all: any ideas/comments?
Peter





More information about the thelist mailing list