[thelist] spiders spoofing MSIE

Darren Beale bealers_lists at exponetic.com
Wed Apr 14 09:04:39 CDT 2004


Hi

I'm working on a project where one sub task is to filter out automated 
agents by their USER_AGENT string. I've done a fair bit of research and 
have come up with a few good lists of bot USER_AGENT strings but I want 
to ensure that the data is as accurate as possible. I particularly want 
to make sure that I'm not missing any agents that pretend to be MSIE as 
these are much harder to pick up using regexp's.

The following is what I've been able to glean from Googling, does anyone 
know of any more?

-----------------8<-----------------

Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot; girafabot at 
girafa dot com; http://www.girafa.com)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) VoilaBot BETA 1.2 
(http://www.voila.com/)
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot; girafabot at 
girafa dot com; http://www.girafa.com)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0; Girafabot; girafabot 
at girafa dot com; http://www.girafa.com)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; ODP links test; 
http://tuezilla.de/test-odp-links-agent.html)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Link Checker Pro 
3.1.52, http://www.Link-Checker-Pro.com)
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) WebWasher 3.3
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) / StripIt 0.4
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) RPT-HTTPClient/0.3-3E
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; Find LinkChecker 
Web Crawler Spider Gatherer)
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent
Mozilla/4.0 (compatible; MSIE 5.5; AOL 7.0; Windows 95; sureseeker.com)

----------------->8-----------------

many thanks in advance

Darren Beale



More information about the thelist mailing list