[thelist] Identify a Web Crawler's request

J Nicholas Tolson jtnt at mindspring.com
Tue Jul 6 12:36:01 CDT 2004


It seems that the Googlebot user-agent string is in flux, but looking for
the text "Googlebot" in the string should single out their bot.

See here for more info on the topic:



On 7/6/04 8:55 AM, "David Travis" <dwork at macam.ac.il> wrote:

> Hi All,
> 
> Interesting question.
> 
> I am working on a site, which requires IE6. In order to prevent users who
> work with other browsers from accessing the site I wrote some kind of filter
> to check the user agent string, and redirect the user to an
> upgrade-your-browser page. This redirection also causes requests from
> web-crawlers (search engines) to be redirected to this page.
> 
> The site contains a lot of content, which I want to be added to the search
> engines' indexes.
> 
> Now to the question: How do I identify a request from a web-crawler? Is
> there a standard header in the HTTP Request to check? I am particularly
> interested in Google's headers since it is most popular.
> 
> Thanks in advance,
> David.
> 



More information about the thelist mailing list