[thelist] responsiveness from "the coffee shop"

Bob Meetin bobm at dottedi.biz
Wed Nov 17 09:37:42 CST 2010

Hassan Schroeder wrote:
> On Tue, Nov 16, 2010 at 12:39 PM, Bob Meetin <bobm at dottedi.biz> wrote:
>> There were about 40 connections from and another 20 or so
>> from, Greece and the U.S. I could block these for the future,
>> but how do I know if they are valid.
> Does your logging capture user-agent info? Might be a spider; even if
> the UA string isn't obviously one, look at the requested URLs to see if
> they resemble typical spider requests.
> Also look at the interval between requests; you might as well block a
> badly behaved spider that's hitting you with a lot of requests at once...
> HTH,
> - - [16/Nov/2010:14:17:03 -0600] "GET 
/index.php?option=com_content&view=article&id=110&Itemid=131 HTTP/1.0" 
404 - "http://www.whatever-website.org" "Opera/9.80 (Windows NT 5.1; U; 
en) Presto/2.6.30 Version/10.62" - - [16/Nov/2010:14:17:03 -0600] "GET 
/index.php?option=com_wordpress&tag=dog-health HTTP/1.1" 404 - 
"http://www.whatevr-website.org" "Opera/9.80 (Windows NT 5.1; U; en) 
Presto/2.6.30 Version/10.62"

I changed the website name, but there were perhaps a hundred, hundreds 
of lines like the above. Almost no interval. I'd previously modified all 
my websites' robots.txt files to, in theory, add delays, and even did 
the google thing via their control panel, but bad spiders don't obey 
rules anyway.

Is there another source where I can check to identify the culprit? To 
narrow it down to who, what, etc? A 'whoisthat' database?

Bob Meetin
dotted i
303-926-0167 (home/business)

More information about the thelist mailing list