[thelist] December stats for evolt.org

Daniel J. Cody djc at starkmedia.com
Fri Jan 4 18:40:53 CST 2002

MacEdition's CodeBitch wrote:

> Thanks,
> May I use this information in a future column?
sure thing, so long as credit goes where credit is due and all that 

>>btw, since we're talking about it.. do you see Inktomi's slurp bot
>>accessing directories that are disallowed in the robots.txt file? I've
>>been seeing it in the last month off and on.. would hate to block them
>>out like the rest of the spambots, but they're archiving things they
>>shouldn't be.. just wondering :)
> Have to admit I haven't tracked this. Slurp used to report as Mozilla 3.0
> but I'm seeing tags with Mozilla 5 in there as well. This could be a
> different bot masquerading as Slurp masquerading as Mozilla. Ech.

ya, its reporting as 5 now, although it still containts the 'slurp' 
string in its user-agent, making it easy to filter out. and the IP's 
resolve back to inktomi as well..heres what I'm seeing:

"Mozilla/5.0 (Slurp/cat; slurp at inktomi.com; http://www.inktomi.com/slurp.html)"

 From the slurp website,
" Disallowed documents, including slash (the home page of the site), are 
not indexed, nor are links in those documents followed. Slurp does read 
slash at each site and uses it internally, but if it is disallowed it is 
neither indexed nor followed."

and yet I see two consecutive lines like this: - - [04/Jan/2002:13:51:44 -0600] "GET /robots.txt 
HTTP/1.0" 404 17523 "-" "Mozilla/5.0 (Slurp/cat; slurp at inktomi.com; 
http://www.inktomi.com/slurp.html)" - - [04/Jan/2002:13:52:03 -0600] "GET /user/foo/000 
HTTP/1.0" 404 12669 "-" "Mozilla/5.0 (Slurp/cat; slurp at inktomi.com; 

the robots.txt file of evolt.org contains:

User-agent: *
Disallow: /user/
Disallow: /email-addresses/

goes to show how much you can believe what some companies would have you 


More information about the thelist mailing list