[thelist] Identify a Web Crawler's request

Shawn K. Quinn skquinn at xevious.kicks-ass.net
Wed Jul 7 00:32:14 CDT 2004

On 2004 July 06, Tuesday 07:55, David Travis wrote:
> Hi All,
> Interesting question.
> I am working on a site, which requires IE6. In order to prevent users
> who work with other browsers from accessing the site I wrote some
> kind of filter to check the user agent string, and redirect the user
> to an upgrade-your-browser page.


Mozilla Firefox 0.9 is not good enough? Konqueror 3.2.3 is not good 
enough? Opera 7.0 is not good enough? Lynx 2.8.5 is not good enough?

> This redirection also causes requests from web-crawlers (search
> engines) to be redirected to this page. 

As well it should, because you've resorted to blatantly clueless 

> The site contains a lot of content, which I want to be added to the
> search engines' indexes.

They are World Wide Web search engines, not Microsoft IE-only narrow web 
search engines. So no, your content does not belong in them until you 
have a World Wide Web site.

> Now to the question: How do I identify a request from a web-crawler?
> Is there a standard header in the HTTP Request to check? I am
> particularly interested in Google's headers since it is most popular.

Make a site for the World Wide Web, not just one browser that only works 
on PCs running Windows. Browser detect garbage is the hallmark of those 
patently devoid of clue.

Shawn K. Quinn

More information about the thelist mailing list