[thelist] robots.txt

Simon Willison cs1spw at bath.ac.uk
Sun Nov 16 22:50:35 CST 2003


Diane Soini wrote:
> How well is the robots.txt file obeyed?

Search engines and other responsible net citizens generally support it 
extremely well. Spam harvesting scripts and other spiders with less than 
noble intent completely ignore it. How useful it is really depends on 
what you are trying to prevent being indexed and why.

One critical thing to remember is that robots.txt is not and never has 
been a security tool. Remember when the RIAA website was hacked last 
year? It was because the *utter muppets* who set the site up created an 
admin panel that wasn't password protected and /listed it in their 
robots.txt file/ to prevent search engines from crawling it. The first 
cracker who loaded that file up in their browser must have fallen off 
their chair laughing.

More info: http://www.theregister.co.uk/content/6/27230.html

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/



More information about the thelist mailing list