[thelist] evolt 404: The explanation
Judah McAuley
judah at alphashop.com
Wed Feb 7 18:21:31 CST 2001
At 11:32 PM 2/7/2001 +0100, you wrote:
>Simply use an url like this [you see the difference? ;-)]:
>http://www.bob.com/index.cfm/top_level/catalog/category/shoes/product_id/37
>
>Apache would detect that "index.cfm" is a script, and would ignore the rest
>of the url. The arguments could be retrieved using a function that returns
>the ending part of the url.
My first foray into dealing with the question mark issue ended up with a
solution exactly like that. However, after talking more with our search
engine marketing-type person, it was decided that some spiders may have
trouble with that index.cfm in the middle of a url. They are very
finicky. One of the reasons spiders don't crawl pages with question marks
is that dynamic sites can be more easily built to trap a spider and
manipulate it. The url you put together is obviously a dynamic url and it
would be simple to program a spider to not follow links resembling
those. However,
http://www.bob.com/top_level/catalog/category/shoes/product_id/37/index.cfm
is a perfectly valid url. There is no good way that I can think of for a
spider to tell whether or not it's a "real" directory with real
files. That's why my company has moved on to using the solution I mentioned.
Judah
More information about the thelist
mailing list