[thelist] Identify a Web Crawler's request
Shawn K. Quinn
skquinn at xevious.kicks-ass.net
Wed Jul 7 00:32:14 CDT 2004
On 2004 July 06, Tuesday 07:55, David Travis wrote:
> Hi All,
>
> Interesting question.
>
> I am working on a site, which requires IE6. In order to prevent users
> who work with other browsers from accessing the site I wrote some
> kind of filter to check the user agent string, and redirect the user
> to an upgrade-your-browser page.
Why?
Mozilla Firefox 0.9 is not good enough? Konqueror 3.2.3 is not good
enough? Opera 7.0 is not good enough? Lynx 2.8.5 is not good enough?
> This redirection also causes requests from web-crawlers (search
> engines) to be redirected to this page.
As well it should, because you've resorted to blatantly clueless
behavior.
> The site contains a lot of content, which I want to be added to the
> search engines' indexes.
They are World Wide Web search engines, not Microsoft IE-only narrow web
search engines. So no, your content does not belong in them until you
have a World Wide Web site.
> Now to the question: How do I identify a request from a web-crawler?
> Is there a standard header in the HTTP Request to check? I am
> particularly interested in Google's headers since it is most popular.
Make a site for the World Wide Web, not just one browser that only works
on PCs running Windows. Browser detect garbage is the hallmark of those
patently devoid of clue.
--
Shawn K. Quinn
More information about the thelist
mailing list