[thelist] perl HTML::Parser question

Sam-I-Am sam at sam-i-am.com
Mon Oct 8 11:07:13 CDT 2001


Is there anyone on the list familiar with this?

I'm just playing around with a parser for prettifiying/standardizing
markup layout (to __my__ rules, not HTML Tidy's or anyone elses. )  but
I'm finding that any xhtml style self-ending elements (e.g <img src=""
/>) don't trigger the start tag handler, but get handled as text
instead. 
This is a big problem. I'd like to just bring HTML/Parser.pm up to date
to allow a /> ending.. Surely someone did this already?

I'm poking around through Parser.pm and I can see a bit in there that
confirms that tag can be ended, else treat as text (~ line 106) - but it
just looks for >. 
Does anyone have any insight into this, or able to suggest a fix? I
can't use XML::Parser as most of the markup is plain HTML. I don't want
to run through Ragget's Tidy as it's throwing away some important code
formatting.  

(actually while I'm at it.. HTML::Parser lowercases all attribute names.
This might become a problem to as there are times when I need
onMouseOver rather than onmouseover)

thanks
Sam




More information about the thelist mailing list