[thelist] XML parsing

Mon Oct 7 10:09:01 CDT 2002

>Here is a question: what is easier for developers (using a wide variety
>of languages) to support:
><topic id="1" name="joe">
></topic>
><topic id="2" name="joe's kid" parent="1">
></topic>
>
>Or
>
><topic id="1" name="joe">
><topic id="2" name="joe's kid">
></topic>
></topic>
>
>(And you can have long trees (multiple nesting)). I need to know how
>hard one is support compared to the other, not just in programming
>languages, but even in things like Moveabletype tags, or other
>environments.
>Peter

Not sure if this has been answered already, but I would think the former
would be easier, at least from a parser standpoint.

Coming from an MSXML background, the parser is generating a tree of objects,
with each object having a parent-node setting that points to it's containing
node. Each piece of the tree is an object, meaning elements, attributes, and
even the text is contained in a separate object node.

The second way would still be built the same way in the parser, but it would
have the additional attribute and text value to worrry about, and you'd have
to program in the capability to look up parent nodes anyway, which would
still be slower than letting the parser do it for you...

Besides, the first example is more intuitive, in my opinion.

I'm not sure about memory usage for massively-nested trees as opposed to a
more flat structure as in your second example, and while intuitively I'd
think it would use more when nested, after thinking about it a bit I don't
believe so because the parser already has to manage the parent-child
relationships and so shouldn't care about the memory locations of each node
(similar to how the file system works with pointers, etc). Not sure whether
it determines relationships as you query the objects e.g. "on a need-to-know
basis" (more memory efficient?) or whether it generates a map and stores
that somewhere similar to the FAT tables (faster query performance?).

Hopefully that last paragraph made sense -- I have a loud paper shredder
running right next to me, hard to think. :)

Bottom line: I prefer the former. And though I'm not a C++ programmer nor a
parser creator, I'd think the former would be just fine with a parser as
well.

If I'm wrong please somebody step up and expose my ignorance to the light of
truth. ;)

HTH,
-dave