[thelist] Identify a Web Crawler's request

Tue Jul 6 12:37:09 CDT 2004

David Travis wrote:
> Hi All,
> 
> Interesting question.
> 
> I am working on a site, which requires IE6. In order to prevent users who
> work with other browsers from accessing the site I wrote some kind of filter
> to check the user agent string, and redirect the user to an
> upgrade-your-browser page. This redirection also causes requests from
> web-crawlers (search engines) to be redirected to this page.
> 
> The site contains a lot of content, which I want to be added to the search
> engines' indexes.
> 
> Now to the question: How do I identify a request from a web-crawler? Is
> there a standard header in the HTTP Request to check? I am particularly
> interested in Google's headers since it is most popular.
> 
> Thanks in advance,
> David.

Your question about identifying search engine spiders is a good one but 
AFAIK there is no cut-and-dry way to identify spiders, as they each have 
a unique user agent string. But I fear you will not be happy with my or 
other replies to your question, as the reason for the question is 
somewhat baffling (and will probably annoy the heck out of lots of 
listers). It begs the question: why is IE6 required for your site? I'm 
guessing it is not a corporate intranet, or you would likely want to 
avoid have it crawled. I'm giving you the benefit of the doubt in hoping 
that perhaps you have a good reason for shutting out all other browsers 
from the site (though I can't think of any good reasons myself), but 
don't be surprised if you get flamed about this :)

Also, out of curiosity: what do non-IE6 visitors to the site see?

-- 
Sarah Sweeney
Web Developer & Programmer
Portfolio :: http://sarah.designshift.com
Blog, etc :: http://hardedge.ca