[thelist] Identify a Web Crawler's request

Shawn K. Quinn skquinn at xevious.kicks-ass.net
Wed Jul 7 00:32:14 CDT 2004


On 2004 July 06, Tuesday 07:55, David Travis wrote:
> Hi All,
>
> Interesting question.
>
> I am working on a site, which requires IE6. In order to prevent users
> who work with other browsers from accessing the site I wrote some
> kind of filter to check the user agent string, and redirect the user
> to an upgrade-your-browser page.

Why?

Mozilla Firefox 0.9 is not good enough? Konqueror 3.2.3 is not good 
enough? Opera 7.0 is not good enough? Lynx 2.8.5 is not good enough?

> This redirection also causes requests from web-crawlers (search
> engines) to be redirected to this page. 

As well it should, because you've resorted to blatantly clueless 
behavior.

> The site contains a lot of content, which I want to be added to the
> search engines' indexes.

They are World Wide Web search engines, not Microsoft IE-only narrow web 
search engines. So no, your content does not belong in them until you 
have a World Wide Web site.

> Now to the question: How do I identify a request from a web-crawler?
> Is there a standard header in the HTTP Request to check? I am
> particularly interested in Google's headers since it is most popular.

Make a site for the World Wide Web, not just one browser that only works 
on PCs running Windows. Browser detect garbage is the hallmark of those 
patently devoid of clue.

-- 
Shawn K. Quinn


More information about the thelist mailing list