[thelist] Bots killing my server
Norman Bunn
norman.bunn at craftedsolutions.com
Wed Feb 6 15:05:17 CST 2008
Apache on Redhat Linux.
Yes, http://www.aroundnewberry.com/robots.txt
Yes, they are case-sensitive and I always try to use lower-case
throughout a site for just that reason, cuts down on the confusion.
Anthony Baratta wrote:
> Are you on Windows or UNIX?
>
> Can you access the Robots.txt file directory via your web browser?
>
> http://www.foo.org/robots.txt
>
> It's been a while since I've worked with Robots.txt. I would check on the syntax for the robots.txt file. Are the commands case sensitive?
>
> One path of last resort is to use Header Rewrites to look for the Agent string and disallow access to the bots to the target directories.
>
> -----Original message-----
> From: Norman Bunn norman.bunn at craftedsolutions.com
> Date: Wed, 06 Feb 2008 12:25:22 -0800
> To: "thelist at lists.evolt.org" thelist at lists.evolt.org
> Subject: Re: [thelist] Bots killing my server
>
>
>> After re-reading I know that some might think the robots.txt file is in
>> the cgi-bin directory, but, no, it is in the http root. An MSN log
>> entry looks like this:
>>
>> 65.55.165.13 - - [05/Feb/2008:04:07:29 -0500] "GET
>> /cgi-bin/Calcium38.pl?CalendarName=LittleMountain&Op=ShowIt&Date=2008/7/1&Amount=Month&NavType=Both&Type=Block
>> HTTP/1.0" 200 19941
>> "http://search.live.com/results.aspx?q=calendar&mrt=en-us&FORM=LIVSOP"
>> "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
>>
>> 65.55.208.26 - - [05/Feb/2008:04:08:12 -0500] "GET /robots.txt HTTP/1.1"
>> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.26 - - [05/Feb/2008:04:08:13 -0500] "GET
>> /cgi-bin/Calcium38.pl?Op=ShowIt&CookieParams=1&CalendarName=Newberry_Parks_Member&Amount=Month&NavType=Absolute&Type=Block&Date=2008/4/1
>> HTTP/1.1" 200 17778 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET /robots.txt HTTP/1.1"
>> 200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> 65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET
>> /cgi-bin/Calcium38.pl?CalendarName=Colony_Lutheran_Member&Op=ShowIt&Amount=Month&NavType=Absolute&Type=Block&Date=2007%2F6%2F1
>> HTTP/1.1" 200 20968 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
>>
>> The misbehaving Yahoo bot is trying access the calendar by using "/?"
>> instead of /cgi-bin/
>>
>> 74.6.24.138 - - [06/Feb/2008:03:56:55 -0500] "GET
>> /?CalendarName=Around_Newberry&Op=ShowIt&Amount=Week&NavType=Both&Type=Condensed&Date=2008%2F2%2F5
>> HTTP/1.0" 200 20017 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
>> http://help.yahoo.com/help/us/ysearch/slurp)"
>>
>> Any suggestions on stopping these?
>>
>> ---
>>
>> Norman W. Bunn
>> norman.bunn at craftedsolutions.com
>> 803.405.1008
>> ----------------------------------------------
>> www.CraftedSolutions.com
>> Crafted Solutions, Inc.
>> Web Design & Development
>> Web Site Hosting & Custom Solutions
>> "Get the results the Internet promises;
>> get the 'Net Result' from Crafted Solutions!"
>> ----------------------------------------------
>>
>>
>>
>> Norman Bunn wrote:
>>
>>> I have a calendar app that is running on my server that bots are
>>> accessing in spite of my robots.txt file saying otherwise (it's in
>>> cgi-bin). Specifically, MSN and Yahoo are the problem. I had this
>>> problem several year back and the update to the robots.txt file fixed it
>>> then, but not now. What other avenues are available to me? Here's the
>>> contents:
>>>
>>> User-agent: *
>>> Disallow: /administrator/
>>> Disallow: /cache/
>>> Disallow: /components/
>>> Disallow: /editor/
>>> Disallow: /help/
>>> Disallow: /images/
>>> Disallow: /includes/
>>> Disallow: /language/
>>> Disallow: /mambots/
>>> Disallow: /media/
>>> Disallow: /modules/
>>> Disallow: /templates/
>>> Disallow: /installation/
>>> Disallow: /cgi-bin/
>>> User-agent: ia_archiver
>>> Disallow: /
>>> User-agent: msnbot
>>> Crawl-delay: 30
>>>
>>> Thanks,
>>>
>>> Norman
>>>
>>>
>>>
>> --
>>
>> * * Please support the community that supports you. * *
>> http://evolt.org/help_support_evolt/
>>
>> For unsubscribe and other options, including the Tip Harvester
>> and archives of thelist go to: http://lists.evolt.org
>> Workers of the Web, evolt !
>>
More information about the thelist
mailing list