[thelist] Identify a Web Crawler's request
raditha dissanayake
jabber at raditha.com
Tue Jul 6 12:21:52 CDT 2004
David Travis wrote:
>Hi All,
>
>Interesting question.
>
>I am working on a site, which requires IE6. In order to prevent users who
>work with other browsers from accessing the site I wrote some kind of filter
>to check the user agent string, and redirect the user to an
>upgrade-your-browser page. This redirection also causes requests from
>web-crawlers (search engines) to be redirected to this page.
>
>The site contains a lot of content, which I want to be added to the search
>engines' indexes.
>
>Now to the question: How do I identify a request from a web-crawler? Is
>there a standard header in the HTTP Request to check? I am particularly
>interested in Google's headers since it is most popular.
>
>
>
The user agent field contains this information - which you are
apparently using already in your filter. However what you are about to
do could get you in trouble with google. In other words search engines
take a dim view of one set of pages for crawlers and another set of
pages for humans.
>
>
>
>
--
Raditha Dissanayake.
---------------------------------------------
http://www.raditha.com/megaupload/upload.php
Sneak past the PHP file upload limits.
More information about the thelist
mailing list