[thelist] Re: Identify a Web Crawler's request

Michael Harrington mike0351 at bellsouth.net
Wed Jul 7 00:56:41 CDT 2004


What about putting <meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> on the browser upgrade page?

This works for me with all the search engines. I am using a browser detection for IE below 5.0 and Netscape below 6.0. If they don't meet this,
they get a page telling them they should upgrade their browser. But I do allow them to go ahead and try to view my site with the browser they are
using.

thelist-request at lists.evolt.org wrote:

> Date: Tue, 6 Jul 2004 14:55:40 +0200
> From: "David Travis" <dwork at macam.ac.il>
> To: <thelist at lists.evolt.org>
> Subject: [thelist] Identify a Web Crawler's request
> Message: 3
>
> Hi All,
>
> Interesting question.
>
> I am working on a site, which requires IE6. In order to prevent users who
> work with other browsers from accessing the site I wrote some kind of filter
> to check the user agent string, and redirect the user to an
> upgrade-your-browser page. This redirection also causes requests from
> web-crawlers (search engines) to be redirected to this page.
>
> The site contains a lot of content, which I want to be added to the search
> engines' indexes.
>
> Now to the question: How do I identify a request from a web-crawler? Is
> there a standard header in the HTTP Request to check? I am particularly
> interested in Google's headers since it is most popular.
>
> Thanks in advance,
> David.
>
> ------------------------------



More information about the thelist mailing list