[thelist] htaccess & robots.txt - blocking accesses to images from outside

Paola Kathuria paola at limitless.co.uk
Mon Jun 25 12:04:28 CDT 2001


Google has a beta image search engine: http://images.google.com/

For my personal site, robots.txt and .htaccess have been set up
to disallow indexing of certain directories and to forbid accesses
to images other than through my web pages.  However, they're still
appearing on Google's beta image search, (e.g., the first two
images in http://images.google.com/images?q=fruit+wallpapers )

My personal site is http://www.limitless.co.uk/~paola/  Here's
an extract from /robots.txt referring to the location of my
wallpaper images - is there anything wrong with it?

Disallow: /~paola/wallpapers/small/
Disallow: /~paola/wallpapers/medium/
Disallow: /~paola/wallpapers/1024x768/

Should I include /%7Epaola/... lines too?

I initially set the forbid rule because several members from a
certain web commnunity were using my 250K wallpapers (as a link on
my site) as background images to their member pages!  However, I
later realised that the rule would also prevent images appearing
on image search engines.

Below is the forbid rule from the .htaccess file - it's obviously
incorrect, or else images wouldn't be displayed in Google.

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://.*limitless.co.*/.*paola/.*$ [NC]
RewriteRule .*\.(jpg|gif)$        -                               [F]

To explain, the ".*limitless.co.*" allows for www.limitless.co.uk,
www.limitless.com, limitless.co.uk and limitless.com - the ".*paola"
allows for ~paola and %7Epaola  - documentation at
http://www.engelschall.com/pw/apache/rewriteguide/#ToC38

If anyone can help fix my forbid rule, I'd very much appreciate it.


Paola




More information about the thelist mailing list