[thelist] [PHP] parsing xml

Simon Willison cs1spw at bath.ac.uk
Mon Aug 25 09:31:44 CDT 2003


Paul Bennett wrote:
> The parser I have developed is very simple and works. However, when I 
> received a sample file from said "other application" developers, I 
> noticed that there are multiple instances of the same tag which would 
> currently choke my parser.
> 
> Example 1:
> The parent tag for each major data grouping is:
> "Programme"
> This tag name also appears under the node "Fees", thus my parser thinks 
> that when it hits this tag that the programme information is complete 
> and all heckfire breaks loose.
> 
> Example 2:
> The title of each "Programme" is contained within the tag "Title", but 
> this tag also is present under a node named "Careers"
> [Insert sobbing sound here]

I presume you're using SAX parsing for this. You've hit on one of the 
tricker aspects of developing a SAX parser, but the problem is not at 
all impossible to solve. The trick is to use variables to keep track of 
where you are in the document. For example, have a variable somewhere 
called $inFees which is set to true whenthe parser sees an open <Fees> 
tag and false when it sees the corresponding end tag. I like to put my 
SAX parsers inside a class using xml_set_object() as it lets me use 
class properties for this kind of thing rather than having to use global 
variables.

An alternative approach to having a bunch of hard-coded boolean 
variables is to use a stack: an array on to which you push() tag names 
when you see them and pop() them when you reach their end tag. Your 
logic for dealing with Programme and Title can then look at the stack to 
figure out what kind of tag it actually is.

With a bit of work keeping track of where you are in the document at any 
one time you should be able to solve your problem. The easier 
alternative though is to use the DOM extension instead, if you've got it 
installed.

Hope that helps,

Simon Willison




More information about the thelist mailing list