[thelist] Stopping robots.txt being read
Keith
cache at dowebs.com
Tue Nov 27 15:17:54 CST 2001
Daniel offered one way to keep robots.txt from appearing in browsers
while still allowing search bots to read it. Here's another approach that
does not involve getting into httpd.conf, turn robots.txt into an SSI (search
robots execute SSI just like a browser does)
Place an .htaccess file in the domain root directory with
AddType text/x-server-parsed-html .txt
in it.
Then put
<!--#if expr="${HTTP_USER_AGENT} = /MSIE/" -->
<!--#elif expr="${HTTP_USER_AGENT} = /Mozilla/" -->
<!--#else -->
file1.html
file2.html
file3.html
etc.
<!--#endif -->
in robots.txt. MSIE and Mozilla browsers get no return, everything else gets
the list of files. Search bots that identify themselves as MSIE or Mozilla get
nothing, the same as if robots.txt was empty.
I still think this is a "busy work" excersize if the data is indeed secure and
not being merely hidden.
> Care to elaborate on the 3 ways...
Your way, my way, and the right way <grin>
The two most common are equally strong and equally weak, placing the
file outside the domain path and placing the file behind an .htaccess
barricade. Both rely on second layer denial administered by the web
server. Both will protect variable-length flatfiles and fixed-length/random-
access(sql) files if there are no other users on the same machine. If there
are multiple users on the machine, and if the files have rw-rw-rw
permissions to allow cgi (perl & php) to read/write to them then they are
also read/write to anyone on the machine - regardless of where they are
located.
A more secure (industrial strength) method relies on Unix permissions
which is administered by the very core of the OS kernel. A file with rw-------
permissions is inaccessible to anyone but the owner of the file -
regardless of where it is located. If the file and any cgi accessing it are
owned by the same user, and if that user is not "other" (aka nobody ||
world) and if cgi processes are forced to run as their owner, then cgi can
read/write the file with only rw------- permissions, and no one else (other
than su) has access, period. Cgi processes can be forced to run as owner
by suexec, cgiwrap, or mod_cgiwrap/mod_phpcgiwrap.
keith
More information about the thelist
mailing list