[thelist] Stopping robots.txt being read

Keith cache at dowebs.com
Mon Nov 26 23:40:01 CST 2001


> Just been reading the rather scary article on CNET, about how Google
> can be used to find passwords etc.
> 
> http://news.cnet.com/news/0-1005-200-7946411.html?tag=tp_pr

It must have been a slow news day for CNET to have published that 
 - any search engine can be used that way, Google just makes it 
easy for someone to find sensitive information by mistake rather 
than by trying.

> How does one go about stopping a robots.txt file being read in a
> browser. Given the file has to be accesible to a search engine, 
> how do you protect it so that a human can't simply type in the 
> robots.txt URL  manually, read the file, and make some educated 
>guesses about where stuff is on the server.

Why bother? Anyone who wants a list of all the public files in a 
domain can do that pretty easily, robot.txt or not. 

Your concern here is based on a flawed concept of security, hide it 
and they can't find it. That doesn't work, never has, never will. If you 
want to secure a file, secure it, don't hide it. There are 3 basic 
methods for securing a file, hiding it is not one of them. 

> wouldn't dream of putting sensitive files in a public area,

That's a good step, but placing a file outside of the domain path or 
placing the file behind Basic Authentication is not necessarily 
secure, unless you're the only user on the machine.  If you want the 
file unavailable to anyone but the owner of the file, simply do not 
give it world or group permissions. 

keith




More information about the thelist mailing list