[thelist] Stopping robots.txt being read

Andy Warwick mailing.lists at creed.co.uk
Mon Nov 26 13:34:46 CST 2001


Just been reading the rather scary article on CNET, about how Google can be used
to find passwords etc.

http://news.cnet.com/news/0-1005-200-7946411.html?tag=tp_pr

While I'd already thought about - and covered - the issues raised, and wouldn't
dream of putting sensitive files in a public area, it did bring back to mind an
important question that has been bugging me for a while.

How does one go about stopping a robots.txt file being read in a browser. Given
the file has to be accesible to a search engine, how do you protect it so that a
human can't simply type in the robots.txt URL manually, read the file, and make
some educated guesses about where stuff is on the server.

For instance, type in www.<mysite>.co.uk/robots.txt and it reveals that a
directory called /licences is disallowed. Seems like a good place to start
reverse-engineering a site's structure for backdoors. (there's actually no such
directory, so don't bother...)

Any good way of stopping humans reading robots.txt, while still allowing robots
to use it?

Andy W




More information about the thelist mailing list