[thelist] Stopping robots.txt being read

Sean Stephens lasso at treefroginteractive.com
Tue Nov 27 12:33:31 CST 2001


Does anyone have a link to exclusion syntax for robots?

http://www.robotstxt.org is great but it doesn't actually give much syntax
or anything. Like, what does this mean? Does this work for all images or is
this a folder solution?

User-agent: *
# directed to all spiders, not just Scooter
Disallow: /source
Disallow:/photo
Disallow:/images
Disallow:/backontrack
Disallow: /cgi-bin


> From: Andy Warwick <mailing.lists at creed.co.uk>
> Reply-To: thelist at lists.evolt.org
> Date: Tue, 27 Nov 2001 04:14:35 -0500
> To: thelist at lists.evolt.org
> Subject: Re: [thelist] Stopping robots.txt being read
> 
> On 2001-11-26 at 22:40, cache at dowebs.com (Keith) wrote:
> 
>>> Just been reading the rather scary article on CNET, about how Google
>>> can be used to find passwords etc.
>>> 
>>> http://news.cnet.com/news/0-1005-200-7946411.html?tag=tp_pr
>> 
>> It must have been a slow news day for CNET to have published that
>> - any search engine can be used that way, Google just makes it
>> easy for someone to find sensitive information by mistake rather
>> than by trying.
> 
> I agree. Slow news day. The fact that you could search for non-text files was
> news though.
> 
>>> How does one go about stopping a robots.txt file being read in a
>>> browser. Given the file has to be accesible to a search engine,
>>> how do you protect it so that a human can't simply type in the
>>> robots.txt URL  manually, read the file, and make some educated
>>> guesses about where stuff is on the server.
>> 
>> Why bother? Anyone who wants a list of all the public files in a
>> domain can do that pretty easily, robot.txt or not.
>> 
>> Your concern here is based on a flawed concept of security, hide it
>> and they can't find it.
> 
> Security by obscurity is no security at all. I know all this. My concern is
> not
> because I'm hiding stuff like that, it's just one more thing to lock down if
> possible to deter the casual snooper. Kinda like having an empty alarm box on
> the wall of your house. Potential crook will go looking for easier pickings,
> even if the real security value of the box is worth squat.
> 
>> That doesn't work, never has, never will. If you
>> want to secure a file, secure it, don't hide it. There are 3 basic
>> methods for securing a file, hiding it is not one of them.
> 
> Care to elaborate on the 3 ways...
> 
>>> wouldn't dream of putting sensitive files in a public area,
>> 
>> That's a good step, but placing a file outside of the domain path or
>> placing the file behind Basic Authentication is not necessarily
>> secure, unless you're the only user on the machine.
> 
> If this stuff was really secure, I'd be putting it on it's own box, hosted on
> site, firewalled from the main server, also hosted on site. As I said, it's
> not.
> But every locked door you can put between someone and what they shouldn't be
> looking at, even if their is no real harm in looking, helps. Hiding the
> robots.text file is an interesting exercise, and one more thing to tick off
> the
> list of potential backdoors for malicious 'script kiddies'.
> 
>> If you want the 
>> file unavailable to anyone but the owner of the file, simply do not
>> give it world or group permissions.
> 
> I agree. And put it on an encrypted removable disk on a chain around their
> neck.
> 
> It's all a matter of degree and tradeoffs, but anything that will help - like
> hiding robots.txt if possible, is worth doing IMHO.
> 
> Andy W
> 
> ---------------------------------------
> For unsubscribe and other options, including
> the Tip Harvester and archive of TheList go to:
> http://lists.evolt.org Workers of the Web, evolt ! 





More information about the thelist mailing list