[thelist] [PHP] parsing xml
Simon Willison
cs1spw at bath.ac.uk
Mon Aug 25 09:31:44 CDT 2003
Paul Bennett wrote:
> The parser I have developed is very simple and works. However, when I
> received a sample file from said "other application" developers, I
> noticed that there are multiple instances of the same tag which would
> currently choke my parser.
>
> Example 1:
> The parent tag for each major data grouping is:
> "Programme"
> This tag name also appears under the node "Fees", thus my parser thinks
> that when it hits this tag that the programme information is complete
> and all heckfire breaks loose.
>
> Example 2:
> The title of each "Programme" is contained within the tag "Title", but
> this tag also is present under a node named "Careers"
> [Insert sobbing sound here]
I presume you're using SAX parsing for this. You've hit on one of the
tricker aspects of developing a SAX parser, but the problem is not at
all impossible to solve. The trick is to use variables to keep track of
where you are in the document. For example, have a variable somewhere
called $inFees which is set to true whenthe parser sees an open <Fees>
tag and false when it sees the corresponding end tag. I like to put my
SAX parsers inside a class using xml_set_object() as it lets me use
class properties for this kind of thing rather than having to use global
variables.
An alternative approach to having a bunch of hard-coded boolean
variables is to use a stack: an array on to which you push() tag names
when you see them and pop() them when you reach their end tag. Your
logic for dealing with Programme and Title can then look at the stack to
figure out what kind of tag it actually is.
With a bit of work keeping track of where you are in the document at any
one time you should be able to solve your problem. The easier
alternative though is to use the DOM extension instead, if you've got it
installed.
Hope that helps,
Simon Willison
More information about the thelist
mailing list