[thelist] Identify a Web Crawler's request

Feingold Josh S Josh.S.Feingold at irs.gov
Tue Jul 6 12:34:41 CDT 2004


David -

You will want to check the "User-Agent" string in the HTTP request for
something like *google*. 

On a side note, why are you requiring IE6?  Unless you have a business need,
I personally don't think it is best practices to limit your user's browser
selection.  

Josh



-----Original Message-----
From: thelist-bounces at lists.evolt.org
[mailto:thelist-bounces at lists.evolt.org] On Behalf Of David Travis
Sent: Tuesday, July 06, 2004 8:56 AM
To: thelist at lists.evolt.org
Subject: [thelist] Identify a Web Crawler's request


Hi All,

Interesting question.

I am working on a site, which requires IE6. In order to prevent users who
work with other browsers from accessing the site I wrote some kind of filter
to check the user agent string, and redirect the user to an
upgrade-your-browser page. This redirection also causes requests from
web-crawlers (search engines) to be redirected to this page.

The site contains a lot of content, which I want to be added to the search
engines' indexes.

Now to the question: How do I identify a request from a web-crawler? Is
there a standard header in the HTTP Request to check? I am particularly
interested in Google's headers since it is most popular.

Thanks in advance,
David.


-- 
* * Please support the community that supports you.  * *
http://evolt.org/help_support_evolt/

For unsubscribe and other options, including the Tip Harvester 
and archives of thelist go to: http://lists.evolt.org 
Workers of the Web, evolt ! 


More information about the thelist mailing list