[thelist] Robots.txt
Seb Barre
sebastien at oven.com
Mon Dec 4 17:12:28 CST 2000
At 08:59 AM 12/5/2000 +1000, Adrian Fischer wrote:
>Hi Guys,
>
>I know if you want to exclude robots form your site you can include a txt
>file to do it. I don't have such a file but found this in my error log:
> File does not exist: /homeblah/blah.com/robots.txt
>
>Any ideas how it got there or what it means
I get those alot in my log files as well. Basically a crawler (robot) hit
your site and requested a robots.txt file to see what it could and couldn't
crawl, and since your server didn't have one, your webserver logged a file
not found error (same as if a human user had requested a page that doesn't
exist), and (for the record) the robot crawled your entire site.
There is an RFC out there somewhere that describes robots.txt files in
detail if you want to customize one to your site structure or exclude only
certain robots, but if you'd rather not have your site crawled at all by
any (robots.txt respecting) crawlers, you can use mine:
-- snip ---- snip ---- snip -- (don't include this line)
User-agent: *
Disallow: /
-- snip ---- snip ---- snip -- (don't include this line)
Just drop a text file named robots.txt with the content above in your root
folder and crawlers will ignore your site.
Obviously this is a _bad_ idea for any type of commercial site since you
want to be indexed, but for personal or development sites, it can be useful.
--- -- -
Seb Barre - seb at oven.com
OVEN Digital Toronto
Work: 416-595-9750 x 222
Mobile: 416-254-5078
http://www.oven.com/
More information about the thelist
mailing list