[thelist] Identify a Web Crawler's request
J Nicholas Tolson
jtnt at mindspring.com
Tue Jul 6 12:36:01 CDT 2004
It seems that the Googlebot user-agent string is in flux, but looking for
the text "Googlebot" in the string should single out their bot.
See here for more info on the topic:
On 7/6/04 8:55 AM, "David Travis" <dwork at macam.ac.il> wrote:
> Hi All,
>
> Interesting question.
>
> I am working on a site, which requires IE6. In order to prevent users who
> work with other browsers from accessing the site I wrote some kind of filter
> to check the user agent string, and redirect the user to an
> upgrade-your-browser page. This redirection also causes requests from
> web-crawlers (search engines) to be redirected to this page.
>
> The site contains a lot of content, which I want to be added to the search
> engines' indexes.
>
> Now to the question: How do I identify a request from a web-crawler? Is
> there a standard header in the HTTP Request to check? I am particularly
> interested in Google's headers since it is most popular.
>
> Thanks in advance,
> David.
>
More information about the thelist
mailing list