[thelist] Identify a Web Crawler's request

raditha dissanayake jabber at raditha.com
Tue Jul 6 12:21:52 CDT 2004

David Travis wrote:

>Hi All,
>Interesting question.
>I am working on a site, which requires IE6. In order to prevent users who
>work with other browsers from accessing the site I wrote some kind of filter
>to check the user agent string, and redirect the user to an
>upgrade-your-browser page. This redirection also causes requests from
>web-crawlers (search engines) to be redirected to this page.
>The site contains a lot of content, which I want to be added to the search
>engines' indexes.
>Now to the question: How do I identify a request from a web-crawler? Is
>there a standard header in the HTTP Request to check? I am particularly
>interested in Google's headers since it is most popular.
The user agent field contains this information - which you are 
apparently using already in your filter. However what you are about to 
do could get you in trouble with google. In other words search engines 
take a dim view of one set of pages for crawlers and another set of 
pages for humans.


Raditha Dissanayake.
Sneak past the PHP file upload limits.

More information about the thelist mailing list