[thelist] XHTML or HTML WAS Good Examples of XHTML Usage

Thu Sep 4 14:26:46 CDT 2003

This e-mail was forwarded to me by Gary -- thanks Gary! -- who thought I
should have a chance to respond. So:

Timothy J. Luoma wrote:
>
> ...let's address http://www.hixie.ch/advocacy/xhtml
>
> > Current UAs are HTML user agents (at best) and certainly not XHTML
> > user agents (certainly not when sent as text/html), so if you send
> > them XHTML you are sending them content in a language which is not
> > native to them, and relying on their error handling.
>
> Show me a browser that can handle HTML and not XHTML.

Windows IE 6.

Testcase: http://www.hixie.ch/tests/adhoc/xhtml/mime-type/001

> > * <script> and <style> elements in XHTML may not have their contents
> > commented out, a trick frequently used in HTML documents to hide
> > the contents of such elements from legacy UAs. [1]
>
> Sure they can, you just have to do it the right way instead of the old way.

I'm curious, could you describe this method? I searched Google, but all I
could find were suggestions that wouldn't actually work for various
reasons. David Baron and I think that you may mean:

   <script ...>
   <!-- --> <![CDATA[ <!--
   ...
   // --> ]]>
   </script>

...but that is likely to cause legacy UAs to choke on the <![CDATA[ and
]]> bits, so is as good as not having anything (which is probably wisest
anyway, since there are basically no UAs that can't parse <script> right
nowadays, even in the mobile device market).

> > * XHTML documents that use the "/>" notation, as in "<link />", are
> > not valid HTML documents. (See the third bullet point in the
> > section entitled "The Myth of "HTML-compatible XHTML 1.0
> > documents"".)
>
> Show me a browser that can't handle <link /> but can handle <link>

No browsers handle <link/> according to the HTML4 spec. (According to
HTML4, <link/> is the same as "<link>&gt;" due to the SGML SHORTTAG NET
feature.)

> > * Document sent as text/html are handled as tag soup [2] by most UAs.
> > Since most authors only check their documents using one or two UAs,
> > rather than using a validator, this means that authors are not
> > checking for validity, and thus most XHTML documents on the web now
> > are invalid.
>
> I dispute this claim.  Where is the data backing it up?

http://www.goer.org/Journal/2003/Apr/index.html#results

> I would argue that most people who bother going to the extent of using
> XHTML care enough to use a validator.

In my experience, most people using XHTML do so by copying and pasting a
DOCTYPE from another site.

> > Therefore the main advantage of using XHTML, that it
> > has to be valid, is lost if the document is then sent as text/html.
> > (Yes, I said _most_ authors. If you are one of the few authors who
> > understands how to avoid the issues raised in this document and
> > does validate all their markup, then this document probably does
> > not apply to you -- see Appendix B.)
>
> I would say <em>one</em> advantage is lost, but it is not the main one, to
> me.

Yet in your e-mail, you started off by saying:

| XHTML because I believe the forced closing of tags is a good thing.  It
| makes you write better code and makes it easier to find mistakes.  Plus
| you have the option of sending it as text/xml and seeing parsing errors
| even before you hit the validator.

That was the first thing you mentioned!

> > * If you ever switch your XHTML documents from text/html to text/xml,
> > then you will in all likelyhood end up with a considerable number
> > of XML errors, meaning your content won't be readable by users.
>
> If you do decide to switch to text/xml, it will be so many years down the
> road that IE 6 is no longer an issue.

IE6 will be the only browser from Microsoft until 2005 at the earliest
(according to Microsoft [1]). Given that history has shown us that it
takes over two years for Microsoft browsers to be phased out [2], it will
be an issue until at least 2007. This is assuming that the next version of
Windows will be adopted at the same rate as new versions of IE, and that
the new version of IE in that version of Windows contains XHTML support,
neither of which are very likely.

However, the point still stands -- regardless of IE6's existence, invalid
XHTML pages sent as text/html _now_ won't be flagged by any browsers (per
the HTMLWG's request that text/html documents be parsed as HTML, not XML
[3]) and thus regardless of the existence of IE, when you switch to an XML
MIME type, you will hit the aforementioned problems if you have invalid
documents, which, as previously documented, is very likely.

[1] http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itcommunity/chats/trans/ie/ie0507.asp
[2] http://www.google.com/press/zeitgeist/jul03_browsers.gif
[3] http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html

> You will only get errors then if you have errors now.  So validate.  (You
> need to do that with HTML4 too.)

Validating will not catch all the errors. For example, it will not catch
scripts that use the DOM1 methods and not the DOM2 namespace-aware
methods, or incorrect commenting out of scripts.

> > * A CSS stylesheet written for an HTML document is interpreted
> > slightly differently in an XHTML context (e.g. the <body> element
> > is not magical in XHTML, tag names must be written in lowercase in
> > XHTML).
>
> If you are writing XHTML, you are probably aware of that when you write
> CSS.

Most XHTML authors don't even know they're writing XHTML. Most authors are
_not_ aware of this. Were you?

> > * A DOM-based script written for an HTML document has subtly
> > different semantics in an XHTML context (e.g. element names are
> > case insensitive and returned in uppercase in HTML, case sensitive
> > and always lowercase in XHTML).
>
> Hrm.  So in HTML I have case insensitivity or UPPERCASE, but in XHTML I
> always have lowercase.  Great, I'll use XHTML!

But even your XHTML scripts, when run in an HTML environment, as they
will when sent as text/html, will use the uppercase version. So you do
_not_ gain consistency by using XHTML if you send it as text/html.

> > * If a user saves an XHTML-as-text/html document to disk and later
> > reopens it locally, triggering the content type sniffing code since
> > filesystems typically do not include file type information, the
> > document could be reopened as XML, potentially resulting in
> > validation errors, parsing differences, or styling differences.
>
> HUH?
>
> If I send a document as text/html, you know what it will get saved
> as? HTML! WHOA!
>
> [...] And what is this "filesystems typically do not include file
> type information"?
>
> "The document could be reopened as XML" -- only likely if the person
> saved it as .xml or similar. Most times it will not be.

It does appear that since I wrote that part of the document, UAs have
become much more proactive about adding extensions to solve this
problem, which is encouraging. The problem still occurs if the file is
downloaded by other means, though, so that there is no extension, so
that the extension triggers the UA content-sniffing behaviour, or so
that the extension is associated with an XML type (e.g. .xhtml).

> And again, the validation errors shouldn't be there regardless of what
> language you are using.

Naturally. But unfortunately, most authors seem to forget this.

For example, the author of this page:

   http://www.joinwow.org/learningcenter/markup/articles/2003/m200302.asp

...claims that the page is XHTML, but if you actually try to parse it
as XHTML in an XHTML-aware Web browser like Mozilla, for example by
changing the MIME type to application/xhtml+xml:

   http://software.hixie.ch/utilities/cgi/content-type-proxy/content-type-proxy?uri=http://www.joinwow.org/learningcenter/markup/articles/2003/m200302.asp&type=application/xhtml%2Bxml

...you get a parsing error.

Should that site ever switch to an XML MIME type, their site will
break quite dramatically. As it seems unlikely that anyone is going to
go through all their pages fixing them, they might as well never have
used XHTML, since they won't benefit from its only real advantage
(stricter parsing). And so they might as well have just stuck with
good old HTML4, which at least wouldn't be depending on UA-specific
error handling behaviour, such as how to handle "<link/>".

> Perhaps I should write an article "Not using file extensions considered
> harmful"

Doing so would probably be very educational, as you would without a
doubt receive many e-mails from people all over the planet explaining
why URIs should not include extensions.

> Yeah I know he says that HTML compatible XHTML is a myth.  Again, show me
> a real world example of it breaking anywhere.

The simplest example is the XHTML "<br/>" which in HTML means "<br>&gt;".

Note that I didn't say Tag Soup compatible XHTML was a myth. I said
HTML compatible XHTML is a myth.

It is a simple fact that a valid XHTML1 document can never validate as
HTML4, and vice versa. That is all that "HTML compatible XHTML is a
myth" means.

In conclusion: on the one hand you claim that using XHTML means
requiring a validator, and that validity is key, yet on the other, you
claim that so long as things work in most browsers, irrespective of
the specs, then it's ok. This seems inconsistent.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'