[thelist] Bots killing my server
Norman Bunn
norman.bunn at craftedsolutions.com
Wed Feb 6 14:25:22 CST 2008
After re-reading I know that some might think the robots.txt file is in
the cgi-bin directory, but, no, it is in the http root. An MSN log
entry looks like this:
65.55.165.13 - - [05/Feb/2008:04:07:29 -0500] "GET
/cgi-bin/Calcium38.pl?CalendarName=LittleMountain&Op=ShowIt&Date=2008/7/1&Amount=Month&NavType=Both&Type=Block
HTTP/1.0" 200 19941
"http://search.live.com/results.aspx?q=calendar&mrt=en-us&FORM=LIVSOP"
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.208.26 - - [05/Feb/2008:04:08:12 -0500] "GET /robots.txt HTTP/1.1"
200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.208.26 - - [05/Feb/2008:04:08:13 -0500] "GET
/cgi-bin/Calcium38.pl?Op=ShowIt&CookieParams=1&CalendarName=Newberry_Parks_Member&Amount=Month&NavType=Absolute&Type=Block&Date=2008/4/1
HTTP/1.1" 200 17778 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET /robots.txt HTTP/1.1"
200 397 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.208.34 - - [05/Feb/2008:04:08:26 -0500] "GET
/cgi-bin/Calcium38.pl?CalendarName=Colony_Lutheran_Member&Op=ShowIt&Amount=Month&NavType=Absolute&Type=Block&Date=2007%2F6%2F1
HTTP/1.1" 200 20968 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
The misbehaving Yahoo bot is trying access the calendar by using "/?"
instead of /cgi-bin/
74.6.24.138 - - [06/Feb/2008:03:56:55 -0500] "GET
/?CalendarName=Around_Newberry&Op=ShowIt&Amount=Week&NavType=Both&Type=Condensed&Date=2008%2F2%2F5
HTTP/1.0" 200 20017 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
Any suggestions on stopping these?
---
Norman W. Bunn
norman.bunn at craftedsolutions.com
803.405.1008
----------------------------------------------
www.CraftedSolutions.com
Crafted Solutions, Inc.
Web Design & Development
Web Site Hosting & Custom Solutions
"Get the results the Internet promises;
get the 'Net Result' from Crafted Solutions!"
----------------------------------------------
Norman Bunn wrote:
> I have a calendar app that is running on my server that bots are
> accessing in spite of my robots.txt file saying otherwise (it's in
> cgi-bin). Specifically, MSN and Yahoo are the problem. I had this
> problem several year back and the update to the robots.txt file fixed it
> then, but not now. What other avenues are available to me? Here's the
> contents:
>
> User-agent: *
> Disallow: /administrator/
> Disallow: /cache/
> Disallow: /components/
> Disallow: /editor/
> Disallow: /help/
> Disallow: /images/
> Disallow: /includes/
> Disallow: /language/
> Disallow: /mambots/
> Disallow: /media/
> Disallow: /modules/
> Disallow: /templates/
> Disallow: /installation/
> Disallow: /cgi-bin/
> User-agent: ia_archiver
> Disallow: /
> User-agent: msnbot
> Crawl-delay: 30
>
> Thanks,
>
> Norman
>
>
More information about the thelist
mailing list