[thelist] Identify a Web Crawler's request

raditha dissanayake jabber at raditha.com
Tue Jul 6 12:21:52 CDT 2004


David Travis wrote:

>Hi All,
>
>Interesting question.
>
>I am working on a site, which requires IE6. In order to prevent users who
>work with other browsers from accessing the site I wrote some kind of filter
>to check the user agent string, and redirect the user to an
>upgrade-your-browser page. This redirection also causes requests from
>web-crawlers (search engines) to be redirected to this page.
>
>The site contains a lot of content, which I want to be added to the search
>engines' indexes.
>
>Now to the question: How do I identify a request from a web-crawler? Is
>there a standard header in the HTTP Request to check? I am particularly
>interested in Google's headers since it is most popular.
>
>  
>
The user agent field contains this information - which you are 
apparently using already in your filter. However what you are about to 
do could get you in trouble with google. In other words search engines 
take a dim view of one set of pages for crawlers and another set of 
pages for humans.

>
>
>  
>


-- 
Raditha Dissanayake.
---------------------------------------------
http://www.raditha.com/megaupload/upload.php
Sneak past the PHP file upload limits.



More information about the thelist mailing list