[thelist] Bots killing my server

Anthony Baratta anthony at baratta.com
Wed Feb 6 14:51:49 CST 2008


Are you on Windows or UNIX?

Can you access the Robots.txt file directory via your web browser?

http://www.foo.org/robots.txt

It's been a while since I've worked with Robots.txt. I would check on the syntax for the robots.txt file. Are the commands case sensitive?

One path of last resort is to use Header Rewrites to look for the Agent string and disallow access to the bots to the target directories.

-----Original message-----
From: Norman Bunn norman.bunn at craftedsolutions.com
Date: Wed, 06 Feb 2008 12:25:22 -0800
To: "thelist at lists.evolt.org" thelist at lists.evolt.org
Subject: Re: [thelist] Bots killing my server

> After re-reading I know that some might think the robots.txt file is in 
> the cgi-bin directory, but, no, it is in the http root.  An MSN log 
> entry looks like this:
> 
> 65.55.165.13 - - [05/Feb/2008:04:07:29 -0500] "GET 
> /cgi-bin/Calcium38.pl?CalendarName=LittleMountain&Op=ShowIt&Date=2008/7/1&Amount=Month&NavType=Both&Type=Block 
> HTTP/1.0" 200 19941 
> "http://search.live.com/results.aspx?q=calendar&mrt=en-us&FORM=LIVSOP" 
> "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
> 
> 65.55.208.26 - - [05/Feb/2008:04:08:12 -0500] "GET /robots.txt HTTP/1.1" 
> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
> 
> 65.55.208.26 - - [05/Feb/2008:04:08:13 -0500] "GET 
> /cgi-bin/Calcium38.pl?Op=ShowIt&CookieParams=1&CalendarName=Newberry_Parks_Member&Amount=Month&NavType=Absolute&Type=Block&Date=2008/4/1 
> HTTP/1.1" 200 17778 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
> 
> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET /robots.txt HTTP/1.1" 
> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
> 
> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET 
> /cgi-bin/Calcium38.pl?CalendarName=Colony_Lutheran_Member&Op=ShowIt&Amount=Month&NavType=Absolute&Type=Block&Date=2007%2F6%2F1 
> HTTP/1.1" 200 20968 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
> 
> The misbehaving Yahoo bot is trying access the calendar by using "/?" 
> instead of  /cgi-bin/
> 
> 74.6.24.138 - - [06/Feb/2008:03:56:55 -0500] "GET 
> /?CalendarName=Around_Newberry&Op=ShowIt&Amount=Week&NavType=Both&Type=Condensed&Date=2008%2F2%2F5 
> HTTP/1.0" 200 20017 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
> http://help.yahoo.com/help/us/ysearch/slurp)"
> 
> Any suggestions on stopping these?
> 
> ---
> 
> Norman W. Bunn
> norman.bunn at craftedsolutions.com
> 803.405.1008
> ----------------------------------------------
> www.CraftedSolutions.com
> Crafted Solutions, Inc.
> Web Design & Development
> Web Site Hosting & Custom Solutions
> "Get the results the Internet promises;
>  get the 'Net Result' from Crafted Solutions!"
> ----------------------------------------------
> 
> 
> 
> Norman Bunn wrote:
> > I have a calendar app that is running on my server that bots are 
> > accessing in spite of my robots.txt file saying otherwise (it's in 
> > cgi-bin).  Specifically, MSN and Yahoo are the problem.  I had this 
> > problem several year back and the update to the robots.txt file fixed it 
> > then, but not now.  What other avenues are available to me?  Here's the 
> > contents:
> >
> > User-agent: *
> > Disallow: /administrator/
> > Disallow: /cache/
> > Disallow: /components/
> > Disallow: /editor/
> > Disallow: /help/
> > Disallow: /images/
> > Disallow: /includes/
> > Disallow: /language/
> > Disallow: /mambots/
> > Disallow: /media/
> > Disallow: /modules/
> > Disallow: /templates/
> > Disallow: /installation/
> > Disallow: /cgi-bin/
> > User-agent: ia_archiver
> > Disallow: /
> > User-agent: msnbot
> > Crawl-delay: 30
> >
> > Thanks,
> >
> > Norman
> >
> >   
> -- 
> 
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
> 
> For unsubscribe and other options, including the Tip Harvester 
> and archives of thelist go to: http://lists.evolt.org 
> Workers of the Web, evolt ! 



More information about the thelist mailing list