[thelist] perl HTML::Parser question
Sam-I-Am
sam at sam-i-am.com
Mon Oct 8 11:07:13 CDT 2001
Is there anyone on the list familiar with this?
I'm just playing around with a parser for prettifiying/standardizing
markup layout (to __my__ rules, not HTML Tidy's or anyone elses. ) but
I'm finding that any xhtml style self-ending elements (e.g <img src=""
/>) don't trigger the start tag handler, but get handled as text
instead.
This is a big problem. I'd like to just bring HTML/Parser.pm up to date
to allow a /> ending.. Surely someone did this already?
I'm poking around through Parser.pm and I can see a bit in there that
confirms that tag can be ended, else treat as text (~ line 106) - but it
just looks for >.
Does anyone have any insight into this, or able to suggest a fix? I
can't use XML::Parser as most of the markup is plain HTML. I don't want
to run through Ragget's Tidy as it's throwing away some important code
formatting.
(actually while I'm at it.. HTML::Parser lowercases all attribute names.
This might become a problem to as there are times when I need
onMouseOver rather than onmouseover)
thanks
Sam
More information about the thelist
mailing list