[thelist] XHTML or HTML WAS Good Examples of XHTML Usage

Timothy J. Luoma luomat at operamail.com
Fri Sep 5 22:36:07 CDT 2003


On Thu, 4 Sep 2003 19:26:46 +0000 (UTC), Ian Hickson <ian at hixie.ch> wrote:

>
> This e-mail was forwarded to me by Gary -- thanks Gary! -- who thought I
> should have a chance to respond. So:
>
> Timothy J. Luoma wrote:
>>
>> ...let's address http://www.hixie.ch/advocacy/xhtml
>>
>> > Current UAs are HTML user agents (at best) and certainly not XHTML
>> > user agents (certainly not when sent as text/html), so if you send
>> > them XHTML you are sending them content in a language which is not
>> > native to them, and relying on their error handling.
>>
>> Show me a browser that can handle HTML and not XHTML.
>
> Windows IE 6.
>
> Testcase: http://www.hixie.ch/tests/adhoc/xhtml/mime-type/001

I'll take your word for it, since I'm partially color-blind.

How about restating this better... show me a real "in the wild" example of 
an XHTML file sent as HTML that causes a browser to fail.


>> > * <script> and <style> elements in XHTML may not have their contents
>> > commented out, a trick frequently used in HTML documents to hide
>> > the contents of such elements from legacy UAs. [1]
>>
>> Sure they can, you just have to do it the right way instead of the old 
>> way.
>
> I'm curious, could you describe this method? I searched Google, but all I
> could find were suggestions that wouldn't actually work for various
> reasons. David Baron and I think that you may mean:
>
>    <script ...>
>    <!-- --> <![CDATA[ <!--
>    ...
>    // --> ]]>
>    </script>

I think I was referring to the "Evil Mangled Comments Embedding Hack"

<script type="text/javascript"><!--//--><![CDATA[//><!--
     ...
   //--><!]]></script>

   <style type="text/css"><!--/*--><![CDATA[/*><!--*/
     ...
   /*]]>*/--></style>

> ...but that is likely to cause legacy UAs to choke on the <![CDATA[ and
> ]]> bits, so is as good as not having anything (which is probably wisest
> anyway, since there are basically no UAs that can't parse <script> right
> nowadays, even in the mobile device market).

I bet my Treo could even deal with XHTML sent as HTML.


>> > * XHTML documents that use the "/>" notation, as in "<link />", are
>> > not valid HTML documents. (See the third bullet point in the
>> > section entitled "The Myth of "HTML-compatible XHTML 1.0
>> > documents"".)
>>
>> Show me a browser that can't handle <link /> but can handle <link>
>
> No browsers handle <link/> according to the HTML4 spec.

That's not what I asked.


>> > * Document sent as text/html are handled as tag soup [2] by most UAs.
>> > Since most authors only check their documents using one or two UAs,
>> > rather than using a validator, this means that authors are not
>> > checking for validity, and thus most XHTML documents on the web now
>> > are invalid.
>>
>> I dispute this claim.  Where is the data backing it up?
>
> http://www.goer.org/Journal/2003/Apr/index.html#results

Well good.  That would be useful to add to the original argument.



>> I would argue that most people who bother going to the extent of using
>> XHTML care enough to use a validator.
>
> In my experience, most people using XHTML do so by copying and pasting a
> DOCTYPE from another site.

I must hang out with a different crowd ;-)

However, as more people write XHTML, it will no doubt become a bigger 
problem.



>> > Therefore the main advantage of using XHTML, that it
>> > has to be valid, is lost if the document is then sent as text/html.
>> > (Yes, I said _most_ authors. If you are one of the few authors who
>> > understands how to avoid the issues raised in this document and
>> > does validate all their markup, then this document probably does
>> > not apply to you -- see Appendix B.)
>>
>> I would say <em>one</em> advantage is lost, but it is not the main one, 
>> to me.
>
> Yet in your e-mail, you started off by saying:
>
> | XHTML because I believe the forced closing of tags is a good thing.  It
> | makes you write better code and makes it easier to find mistakes.  Plus
> | you have the option of sending it as text/xml and seeing parsing errors
> | even before you hit the validator.
>
> That was the first thing you mentioned!

No, the first thing I mentioned was that it was easier to find mistakes.


>> > * If you ever switch your XHTML documents from text/html to text/xml,
>> > then you will in all likelyhood end up with a considerable number
>> > of XML errors, meaning your content won't be readable by users.
>>
>> If you do decide to switch to text/xml, it will be so many years down 
>> the road that IE 6 is no longer an issue.
>
> IE6 will be the only browser from Microsoft until 2005 at the earliest
> (according to Microsoft [1]). Given that history has shown us that it
> takes over two years for Microsoft browsers to be phased out [2], it will
> be an issue until at least 2007. This is assuming that the next version 
> of Windows will be adopted at the same rate as new versions of IE, and 
> that
> the new version of IE in that version of Windows contains XHTML support,
> neither of which are very likely.
>
> However, the point still stands -- regardless of IE6's existence, invalid
> XHTML pages sent as text/html _now_ won't be flagged by any browsers (per
> the HTMLWG's request that text/html documents be parsed as HTML, not XML
> [3]) and thus regardless of the existence of IE, when you switch to an 
> XML MIME type, you will hit the aforementioned problems if you have 
> invalid
> documents, which, as previously documented, is very likely.

What I meant was that no one is going to send XHTML as XML until well 
after IE6 is dead, and I mean Netscape3 dead.  Which is so far in the 
future as to not even bother with worrying about.


>> You will only get errors then if you have errors now.  So validate.  
>> (You need to do that with HTML4 too.)
>
> Validating will not catch all the errors. For example, it will not catch
> scripts that use the DOM1 methods and not the DOM2 namespace-aware
> methods, or incorrect commenting out of scripts.

Could be... I don't do Javascript so I have to claim ignorance there.  I 
thought we were talking XHTML vs HTML.


>> > * A CSS stylesheet written for an HTML document is interpreted
>> > slightly differently in an XHTML context (e.g. the <body> element
>> > is not magical in XHTML, tag names must be written in lowercase in
>> > XHTML).
>>
>> If you are writing XHTML, you are probably aware of that when you write
>> CSS.
>
> Most XHTML authors don't even know they're writing XHTML. Most authors 
> are _not_ aware of this. Were you?

Sure I was... but I use UltraEdit, so I'm pretty much aware of what I'm 
writing.


>> > * A DOM-based script written for an HTML document has subtly
>> > different semantics in an XHTML context (e.g. element names are
>> > case insensitive and returned in uppercase in HTML, case sensitive
>> > and always lowercase in XHTML).
>>
>> Hrm.  So in HTML I have case insensitivity or UPPERCASE, but in XHTML I
>> always have lowercase.  Great, I'll use XHTML!
>
> But even your XHTML scripts, when run in an HTML environment, as they
> will when sent as text/html, will use the uppercase version. So you do
> _not_ gain consistency by using XHTML if you send it as text/html.

Again, I'll take your word for it, as I don't do JS.  The original 
argument would be stronger if it made this clear.


> It does appear that since I wrote that part of the document, UAs have
> become much more proactive about adding extensions to solve this
> problem, which is encouraging. The problem still occurs if the file is
> downloaded by other means, though, so that there is no extension, so
> that the extension triggers the UA content-sniffing behaviour, or so
> that the extension is associated with an XML type (e.g. .xhtml).


>> And again, the validation errors shouldn't be there regardless of what
>> language you are using.
>
> Naturally. But unfortunately, most authors seem to forget this.
>
> For example, the author of this page:
>
>    http://www.joinwow.org/learningcenter/markup/articles/2003/m200302.asp
>
> ...claims that the page is XHTML, but if you actually try to parse it
> as XHTML in an XHTML-aware Web browser like Mozilla, for example by
> changing the MIME type to application/xhtml+xml:
>
>    http://software.hixie.ch/utilities/cgi/content-type-proxy/content-type-proxy?uri=http://www.joinwow.org/learningcenter/markup/articles/2003/m200302.asp&type=application/xhtml%2Bxml
>
> ...you get a parsing error.
>
> Should that site ever switch to an XML MIME type, their site will
> break quite dramatically. As it seems unlikely that anyone is going to
> go through all their pages fixing them, they might as well never have
> used XHTML, since they won't benefit from its only real advantage
> (stricter parsing). And so they might as well have just stuck with
> good old HTML4, which at least wouldn't be depending on UA-specific
> error handling behaviour, such as how to handle "<link/>".

All I can say is that the page was valid when I submitted it, it didn't 
have the invalid SGML characters, and the invalid (non XHTML) markup was 
the navigation stuff to the side of the page that I didn't have anything 
to do with.

But thanks for peddling an old article of mine for me, I appreciate the 
free publicity.


>> Perhaps I should write an article "Not using file extensions considered
>> harmful"
>
> Doing so would probably be very educational, as you would without a
> doubt receive many e-mails from people all over the planet explaining
> why URIs should not include extensions.

I doubt it would be educational at all, since I haven't used file 
extensions in years and I know what the arguments are for not using them 
already.  However, in practice, a page without an extension failed in 
1/3rd of the browsers tested (IE and Opera were OK, Mozilla was not).

(I also didn't like the .asp at the top of the article above, but if you 
check my personal site you will see that I do not use file extensions at 
all, and most of my pages validate.  The ones that don't are usually 
because of some MovableType unencoded & or some such.


>> Yeah I know he says that HTML compatible XHTML is a myth.  Again, show 
>> me a real world example of it breaking anywhere.
>
> The simplest example is the XHTML "<br/>" which in HTML means "<br>&gt;".

Is there a browser that renders <br /> as <br>&gt; ?

My point was that there are a lot of theoretical problems with it, but 
much fewer practical ones.



> In conclusion: on the one hand you claim that using XHTML means
> requiring a validator, and that validity is key, yet on the other, you
> claim that so long as things work in most browsers, irrespective of
> the specs, then it's ok. This seems inconsistent.

A foolish consistency is the hobgoblin of little minds.  We live in an 
inconsistent world.  Valid markup doesn't always work, invalid markup 
doesn't always fail.  Browsers are forced to deal with bad code because 
there is a gargantuan amount of it out there that will never, ever be 
cleaned up.  Wireless browsers are going to have to do the same thing, and 
handheld devices are gaining speed and processing power... the browsers 
therein are going to have to deal with the same crappy code.  It's a fact 
of life.

XHTML pushes me towards writing stricter code.  Do I do it perfectly?  No, 
but it makes me try harder.

I have been writing XHTML for a couple years for personal projects, but 
when I got my Treo handheld, I realized that it would download faster to 
write extremely bad HTML.  For me in my usage, it was more important to 
write really bad HTML in that situation because speed is paramount.  The 
irony is not lost on me, nor is the reality of the inconsistencies that we 
live with is.

Look, at the end of the day, talking about the difference between HTML and 
XHTML is duller than toejam.  Arguing against sending XHTML as text/html, 
when using any of the XHTML mime types is just purely impractical, is 
futile.

I also said that XHTML was easier to parse and makes it easier to 
transition to understand some XML concepts.

I still see no practical reason not to teach XHTML + a validator... and no 
practical reason not to follow the W3C spec that says that XHTML 1.0 docs 
can be sent as text/html *since we really have no other option*.  If you 
use HTML4 instead of XHTML you aren't going to get parsing warnings either.


Another part of the inconsistency is that I wrote a PHP snippet to send me 
application/xhtml+xml when using Mozilla or Opera.  Now that Opera 7.2 
sends the right HTTP_ACCEPT headers, I will probably start using that more 
often.  So when I am looking at my own site from my own computer, I will 
be getting application/xhtml+xml (or whatever that dreadful MIME type 
is).  So I can get the benefit of XHTML without having to inflict it on 
other people.

My reasoning is that if I do miss a validating/parsing error, I'd rather 
have a visitor be able to see the page than be forced to look at an error 
page.

So there we go.

TjL


More information about the thelist mailing list