[thelist] User Agent question

Shawn K. Quinn skquinn at xevious.kicks-ass.net
Sat Feb 28 00:32:06 CST 2004


On Friday 2004 February 27 18:50, Maximillian Schwanekamp wrote:
> Got a user agent question - I'm setting up a 404 script for my client
> to redirect traffic according to the url they entered.  For
> optimization's sake, I would like to skip all processing if the user
> agent is a spider.  Is there an easy way to differentiate between a
> spider and a browser (at least for non-spoofed user agents!)??  I am
> thinking I can safely treat the UA as a spider if the UA string has
> none of "Mozilla","MSIE","Opera","Gecko".

You should add at least "Safari", "Konqueror", "Lynx", "Links", and 
"w3m" to that list.

Really, the best way to go about solving the problem of detecting robots 
would be to add known robot User-Agent fields to the list. However, 
there is a lot of doubt in my mind as to the real problem; read on.

> I am not looking for perfect treatment - if a UA string is spoofed,
> fine. But when a spider hits the site 50-100 times, there is no reason
> for the script to be doing a database lookup every time.

Maybe it would be best to not have so many 404 errors to begin with. 
Remember, "URIs don't change; people change them." (from the W3C Style 
Guide)

-- 
Shawn K. Quinn


More information about the thelist mailing list