[thelist] User Agent question

Maximillian Schwanekamp anaxamaxan at neptunewebworks.com
Sat Feb 28 01:15:14 CST 2004


>"Safari", "Konqueror", "Lynx", "Links", and "w3m"
Well, Safari uses the "Mozilla/" keyword, as do Konqueror and Galeon.
Forgot about Lynx and Links.  Thanks!  I also forgot about iCab, and also
just learned that Opera often does not use "Mozilla/" in its UA string.

>Maybe it would be best to not have so many 404 errors to begin with.
No, it's intentional.  The 404 script would redirect to the correct scripts
based on keywords stored in a database.  Only truly bad links would
terminate in a 404.

BUT since I posted this question, I did find a few good lists of UA
strings[1].  I can see that trying to sort humans from robots, even with a
margin of acceptable error, gets messy fast.  Oh well, forget I ever brought
it up!!

[1]. http://www.pgts.com.au/pgtsj/pgtsj0208c.html for example.

Maximillian Von Schwanekamp
Websites for Profitable Microbusiness
NeptuneWebworks.com
voice: 541-302-1438
fax: 208-730-6504


-----Original Message-----
From: Shawn K. Quinn [mailto:skquinn at xevious.kicks-ass.net]
Sent: Friday, February 27, 2004 10:32 PM
To: thelist at lists.evolt.org
Subject: Re: [thelist] User Agent question


On Friday 2004 February 27 18:50, Maximillian Schwanekamp wrote:
> Got a user agent question - I'm setting up a 404 script for my client
> to redirect traffic according to the url they entered.  For
> optimization's sake, I would like to skip all processing if the user
> agent is a spider.  Is there an easy way to differentiate between a
> spider and a browser (at least for non-spoofed user agents!)??  I am
> thinking I can safely treat the UA as a spider if the UA string has
> none of "Mozilla","MSIE","Opera","Gecko".

You should add at least "Safari", "Konqueror", "Lynx", "Links", and
"w3m" to that list.

Really, the best way to go about solving the problem of detecting robots
would be to add known robot User-Agent fields to the list. However,
there is a lot of doubt in my mind as to the real problem; read on.

> I am not looking for perfect treatment - if a UA string is spoofed,
> fine. But when a spider hits the site 50-100 times, there is no reason
> for the script to be doing a database lookup every time.

Maybe it would be best to not have so many 404 errors to begin with.
Remember, "URIs don't change; people change them." (from the W3C Style
Guide)

--
Shawn K. Quinn
--
* * Please support the community that supports you.  * *
http://evolt.org/help_support_evolt/

For unsubscribe and other options, including the Tip Harvester
and archives of thelist go to: http://lists.evolt.org
Workers of the Web, evolt !






More information about the thelist mailing list