[thelist] evolt 404: The explanation

Judah McAuley judah at alphashop.com
Wed Feb 7 18:21:31 CST 2001


At 11:32 PM 2/7/2001 +0100, you wrote:
>Simply use an url like this [you see the difference? ;-)]:
>http://www.bob.com/index.cfm/top_level/catalog/category/shoes/product_id/37
>
>Apache would detect that "index.cfm"  is a script, and would ignore the rest
>of the url. The arguments could be retrieved using a function that returns
>the ending part of the url.

My first foray into dealing with the question mark issue ended up with a 
solution exactly like that.  However, after talking more with our search 
engine marketing-type person, it was decided that some spiders may have 
trouble with that index.cfm in the middle of a url.  They are very 
finicky.  One of the reasons spiders don't crawl pages with question marks 
is that dynamic sites can be more easily built to trap a spider and 
manipulate it.  The url you put together is obviously a dynamic url and it 
would be simple to program a spider to not follow links resembling 
those.  However, 
http://www.bob.com/top_level/catalog/category/shoes/product_id/37/index.cfm 
is a perfectly valid url.  There is no good way that I can think of for a 
spider to tell whether or not it's a "real" directory with real 
files.  That's why my company has moved on to using the solution I mentioned.

Judah





More information about the thelist mailing list