[thelist] evolt 404: The explanation

Judah McAuley judah at alphashop.com
Wed Feb 7 17:08:31 CST 2001


Recap: Search engine spiders don't like query strings. Developers have
been trying to over come this problem by making dynamic,
query-string sort of urls into urls that *look* like they are well-formed.

For instance,
http://www.bob.com/index.cfm?top_level=catalog&category=shoes&product_id=37
would be re-written as:
http://www.bob.com/top_level/catalog/category/shoes/product_id/37/index.cfm

This looks like a nice deep structured site that spiders would love to
crawl. The big problem is that the big long directory structure doesn't
really exist.

To get around this problem in Cold Fusion, some people have started using
the Missing Template Handler functionality that appeared in CF 4.5 . The
Missing Template Handler allows you to specify a single CF template to
handle all instances of 404 errors. Presumably, this feature was
intended to encourage graceful error handling. But, like most things, it
has been perverted to do all sorts of interesting things, like run the
entire site through a single template.

The problem: Using the missing template handler under IIS 5 is pretty
straight forward. First, configure CF to use the file that you want as the
handler. Then you have to tell IIS to not check for the existence of a file
that is requested. This part is important, because CF needs to be involved
with the 404 error instead of IIS returning a generic 404 error. Then CF
runs that template and the result is returned to the browser with no
problems.

However, Evolt runs under Linux/Apache. When
http://www.bob.com/top_level/catalog/category/shoes/product_id/37/index.cfm
is requested, Apache sees that the file index.cfm does not exist in that
directory and it tells CF so. CF launches the template handler and
generates all of the necessary content, and then returns the content to the
browser. Everything seems fine, except for the fact that Apache tacked on a
404 HTTP header to the request, so the returned page has the correct
content, but a HTTP header that says the content doesn't exist.
Most browsers appear to ignore the HTTP header, but it causes confusion in
some browsers. If IE is set to display "friendly" error messages, it may
display the Microsoft-supplied error instead of the page content. If you
turn off friendly error messages, then suddenly the request goes from being
a 404 error to an actual page because content really is being returned.

djc tried hacking apache to write a 200 status code instead of a 404. An
interesting thing happened: CF displayed a 404 error and didn't launch the
Missing Template Handler. This would seem to indicate that CF needs to know
from Apache that there is a missing file before it will launch the Missing
Template Handler.

The solution, then, is to let Apache pass a 404 status to CF, and then use
<cfheader statuscode="200" statustext="OK"> in the Missing Template Handler
to write over the HTTP Status header on the way back to the browser. Then
everything works fine.

This all makes sense, but why don't you need to do this under IIS? My guess
is that it's because of the option to have IIS not check for the existence
of a file before handing off the request. Since it doesn't check for the
existence of a file, it doesn't send a 404 status to CF. I'm not sure what
it does send to CF, though. I'm guessing that IIS is sending a 200 status
by default to CF and since CF is able to deal with the request correctly, it
never changes the header and the request just gets passed through with the
default 200 status code. Checking the validity of this assumption would
require far more knowledge of ISAPI programming than I have. Anyone care to
investigate?

Bottom line: The missing template handler is a great way to make search
engine friendly URL's under IIS and it can made to work under Apache. It
appears that the method may require some tricky hacking to work under
different web servers though. I'd be curious to see other people's
experience trying to implement it under Netscape (iPlanet) and Website Pro.

Hope this helps explain things,

Judah





More information about the thelist mailing list