[thelist] Bots killing my server

Norman Bunn norman.bunn at craftedsolutions.com
Wed Feb 6 15:05:17 CST 2008


Apache on Redhat Linux.

Yes, http://www.aroundnewberry.com/robots.txt

Yes, they are case-sensitive and I always try to use lower-case 
throughout a site for just that reason, cuts down on the confusion.

Anthony Baratta wrote:
> Are you on Windows or UNIX?
>
> Can you access the Robots.txt file directory via your web browser?
>
> http://www.foo.org/robots.txt
>
> It's been a while since I've worked with Robots.txt. I would check on the syntax for the robots.txt file. Are the commands case sensitive?
>
> One path of last resort is to use Header Rewrites to look for the Agent string and disallow access to the bots to the target directories.
>
> -----Original message-----
> From: Norman Bunn norman.bunn at craftedsolutions.com
> Date: Wed, 06 Feb 2008 12:25:22 -0800
> To: "thelist at lists.evolt.org" thelist at lists.evolt.org
> Subject: Re: [thelist] Bots killing my server
>
>   
>> After re-reading I know that some might think the robots.txt file is in 
>> the cgi-bin directory, but, no, it is in the http root.  An MSN log 
>> entry looks like this:
>>
>> 65.55.165.13 - - [05/Feb/2008:04:07:29 -0500] "GET 
>> /cgi-bin/Calcium38.pl?CalendarName=LittleMountain&Op=ShowIt&Date=2008/7/1&Amount=Month&NavType=Both&Type=Block 
>> HTTP/1.0" 200 19941 
>> "http://search.live.com/results.aspx?q=calendar&mrt=en-us&FORM=LIVSOP" 
>> "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
>>
>> 65.55.208.26 - - [05/Feb/2008:04:08:12 -0500] "GET /robots.txt HTTP/1.1" 
>> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.26 - - [05/Feb/2008:04:08:13 -0500] "GET 
>> /cgi-bin/Calcium38.pl?Op=ShowIt&CookieParams=1&CalendarName=Newberry_Parks_Member&Amount=Month&NavType=Absolute&Type=Block&Date=2008/4/1 
>> HTTP/1.1" 200 17778 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET /robots.txt HTTP/1.1" 
>> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET 
>> /cgi-bin/Calcium38.pl?CalendarName=Colony_Lutheran_Member&Op=ShowIt&Amount=Month&NavType=Absolute&Type=Block&Date=2007%2F6%2F1 
>> HTTP/1.1" 200 20968 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> The misbehaving Yahoo bot is trying access the calendar by using "/?" 
>> instead of  /cgi-bin/
>>
>> 74.6.24.138 - - [06/Feb/2008:03:56:55 -0500] "GET 
>> /?CalendarName=Around_Newberry&Op=ShowIt&Amount=Week&NavType=Both&Type=Condensed&Date=2008%2F2%2F5 
>> HTTP/1.0" 200 20017 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; 
>> http://help.yahoo.com/help/us/ysearch/slurp)"
>>
>> Any suggestions on stopping these?
>>
>> ---
>>
>> Norman W. Bunn
>> norman.bunn at craftedsolutions.com
>> 803.405.1008
>> ----------------------------------------------
>> www.CraftedSolutions.com
>> Crafted Solutions, Inc.
>> Web Design & Development
>> Web Site Hosting & Custom Solutions
>> "Get the results the Internet promises;
>>  get the 'Net Result' from Crafted Solutions!"
>> ----------------------------------------------
>>
>>
>>
>> Norman Bunn wrote:
>>     
>>> I have a calendar app that is running on my server that bots are 
>>> accessing in spite of my robots.txt file saying otherwise (it's in 
>>> cgi-bin).  Specifically, MSN and Yahoo are the problem.  I had this 
>>> problem several year back and the update to the robots.txt file fixed it 
>>> then, but not now.  What other avenues are available to me?  Here's the 
>>> contents:
>>>
>>> User-agent: *
>>> Disallow: /administrator/
>>> Disallow: /cache/
>>> Disallow: /components/
>>> Disallow: /editor/
>>> Disallow: /help/
>>> Disallow: /images/
>>> Disallow: /includes/
>>> Disallow: /language/
>>> Disallow: /mambots/
>>> Disallow: /media/
>>> Disallow: /modules/
>>> Disallow: /templates/
>>> Disallow: /installation/
>>> Disallow: /cgi-bin/
>>> User-agent: ia_archiver
>>> Disallow: /
>>> User-agent: msnbot
>>> Crawl-delay: 30
>>>
>>> Thanks,
>>>
>>> Norman
>>>
>>>   
>>>       
>> -- 
>>
>> * * Please support the community that supports you.  * *
>> http://evolt.org/help_support_evolt/
>>
>> For unsubscribe and other options, including the Tip Harvester 
>> and archives of thelist go to: http://lists.evolt.org 
>> Workers of the Web, evolt ! 
>>     



More information about the thelist mailing list