[thelist] Bots killing my server

Norman Bunn norman.bunn at craftedsolutions.com
Wed Feb 6 14:25:22 CST 2008


After re-reading I know that some might think the robots.txt file is in 
the cgi-bin directory, but, no, it is in the http root.  An MSN log 
entry looks like this:

65.55.165.13 - - [05/Feb/2008:04:07:29 -0500] "GET 
/cgi-bin/Calcium38.pl?CalendarName=LittleMountain&Op=ShowIt&Date=2008/7/1&Amount=Month&NavType=Both&Type=Block 
HTTP/1.0" 200 19941 
"http://search.live.com/results.aspx?q=calendar&mrt=en-us&FORM=LIVSOP" 
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"

65.55.208.26 - - [05/Feb/2008:04:08:12 -0500] "GET /robots.txt HTTP/1.1" 
200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

65.55.208.26 - - [05/Feb/2008:04:08:13 -0500] "GET 
/cgi-bin/Calcium38.pl?Op=ShowIt&CookieParams=1&CalendarName=Newberry_Parks_Member&Amount=Month&NavType=Absolute&Type=Block&Date=2008/4/1 
HTTP/1.1" 200 17778 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET /robots.txt HTTP/1.1" 
200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET 
/cgi-bin/Calcium38.pl?CalendarName=Colony_Lutheran_Member&Op=ShowIt&Amount=Month&NavType=Absolute&Type=Block&Date=2007%2F6%2F1 
HTTP/1.1" 200 20968 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

The misbehaving Yahoo bot is trying access the calendar by using "/?" 
instead of  /cgi-bin/

74.6.24.138 - - [06/Feb/2008:03:56:55 -0500] "GET 
/?CalendarName=Around_Newberry&Op=ShowIt&Amount=Week&NavType=Both&Type=Condensed&Date=2008%2F2%2F5 
HTTP/1.0" 200 20017 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
http://help.yahoo.com/help/us/ysearch/slurp)"

Any suggestions on stopping these?

---

Norman W. Bunn
norman.bunn at craftedsolutions.com
803.405.1008
----------------------------------------------
www.CraftedSolutions.com
Crafted Solutions, Inc.
Web Design & Development
Web Site Hosting & Custom Solutions
"Get the results the Internet promises;
 get the 'Net Result' from Crafted Solutions!"
----------------------------------------------



Norman Bunn wrote:
> I have a calendar app that is running on my server that bots are 
> accessing in spite of my robots.txt file saying otherwise (it's in 
> cgi-bin).  Specifically, MSN and Yahoo are the problem.  I had this 
> problem several year back and the update to the robots.txt file fixed it 
> then, but not now.  What other avenues are available to me?  Here's the 
> contents:
>
> User-agent: *
> Disallow: /administrator/
> Disallow: /cache/
> Disallow: /components/
> Disallow: /editor/
> Disallow: /help/
> Disallow: /images/
> Disallow: /includes/
> Disallow: /language/
> Disallow: /mambots/
> Disallow: /media/
> Disallow: /modules/
> Disallow: /templates/
> Disallow: /installation/
> Disallow: /cgi-bin/
> User-agent: ia_archiver
> Disallow: /
> User-agent: msnbot
> Crawl-delay: 30
>
> Thanks,
>
> Norman
>
>   



More information about the thelist mailing list