[thelist] Stopping robots.txt being read

Keith cache at dowebs.com
Tue Nov 27 15:17:54 CST 2001


Daniel offered one way to keep robots.txt from appearing in browsers 
while still allowing search bots to read it. Here's another approach that 
does not involve getting into httpd.conf, turn robots.txt into an SSI (search 
robots execute SSI just like a browser does)

Place an .htaccess file in the domain root directory with
AddType text/x-server-parsed-html .txt
in it.

Then put 

<!--#if expr="${HTTP_USER_AGENT} = /MSIE/" -->
<!--#elif expr="${HTTP_USER_AGENT} = /Mozilla/" -->
<!--#else -->
file1.html
file2.html
file3.html
etc.
<!--#endif -->

in robots.txt. MSIE and Mozilla browsers get no return, everything else gets 
the list of files. Search bots that identify themselves as MSIE or Mozilla get 
nothing, the same as if robots.txt was empty.

I still think this is a "busy work" excersize if the data is indeed secure and 
not being merely hidden. 

> Care to elaborate on the 3 ways...

Your way, my way, and the right way <grin>

The two most common are equally strong and equally weak, placing the 
file outside the domain path and placing the file behind an .htaccess 
barricade. Both rely on second layer denial administered by the web 
server. Both will protect variable-length flatfiles and fixed-length/random-
access(sql) files if there are no other users on the same machine. If there 
are multiple users on the machine, and if the files have rw-rw-rw 
permissions to allow cgi (perl & php) to read/write to them then they are 
also read/write to anyone on the machine - regardless of where they are 
located.

A more secure (industrial strength) method relies on Unix permissions 
which is administered by the very core of the OS kernel. A file with rw------- 
permissions is inaccessible to anyone but the owner of the file - 
regardless of where it is located. If the file and any cgi accessing it are 
owned by the same user, and if that user is not "other" (aka nobody || 
world) and if cgi processes are forced to run as their owner, then cgi can 
read/write the file with only rw------- permissions, and no one else (other 
than su) has access, period. Cgi processes can be forced to run as owner 
by suexec, cgiwrap, or mod_cgiwrap/mod_phpcgiwrap. 

keith





More information about the thelist mailing list