[thelist] Robots.txt

Seb Barre sebastien at oven.com
Mon Dec 4 17:12:28 CST 2000


At 08:59 AM 12/5/2000 +1000, Adrian Fischer wrote:
>Hi Guys,
>
>I know if you want to exclude robots form your site you can include a txt
>file to do it.  I don't have such a file but found this in my error log:
>  File does not exist: /homeblah/blah.com/robots.txt
>
>Any ideas how it got there or what it means

I get those alot in my log files as well.  Basically a crawler (robot) hit 
your site and requested a robots.txt file to see what it could and couldn't 
crawl, and since your server didn't have one, your webserver logged a file 
not found error (same as if a human user had requested a page that doesn't 
exist), and (for the record) the robot crawled your entire site.

There is an RFC out there somewhere that describes robots.txt files in 
detail if you want to customize one to your site structure or exclude only 
certain robots, but if you'd rather not have your site crawled at all by 
any (robots.txt respecting) crawlers, you can use mine:

-- snip ---- snip ---- snip -- (don't include this line)
User-agent: *
Disallow: /
-- snip ---- snip ---- snip -- (don't include this line)

Just drop a text file named robots.txt with the content above in your root 
folder and crawlers will ignore your site.

Obviously this is a _bad_ idea for any type of commercial site since you 
want to be indexed, but for personal or development sites, it can be useful.


--- -- -
Seb Barre - seb at oven.com
OVEN Digital Toronto
Work: 416-595-9750 x 222
Mobile: 416-254-5078
http://www.oven.com/





More information about the thelist mailing list