[thelist] Identify a Web Crawler's request

J Nicholas Tolson jtnt at mindspring.com
Tue Jul 6 12:39:09 CDT 2004


On 7/6/04 8:55 AM, "David Travis" <dwork at macam.ac.il> wrote:

> Now to the question: How do I identify a request from a web-crawler? Is
> there a standard header in the HTTP Request to check? I am particularly
> interested in Google's headers since it is most popular.
> 
> Thanks in advance,
> David.
> 


Sorry for the previous incomplete and incorrectly formatted post, fingers
hit the wrong key combo.


It seems that the Googlebot user-agent string may be in flux, but looking
for the text "Googlebot" in the string should single out their bot.

See here for more info on the topic:
http://www.markcarey.com/googleguy-says/archives/googlebot-useragent-change.
html


Nicholas



More information about the thelist mailing list