[thelist] Robots.txt, the robots meta tag, and copyright references needed.

Mon Jan 14 16:55:28 CST 2002

> From: April <april at farstrider.org>
[...] 
> Here is their chain of thought, where they got it I don't know:
> 1.  Having a robots.txt prevents people from copying our information

false... robots.txt files are there to guide *good* spiders who pay 
attention to them... *anyone* can ignore a robots.txt file, and 
everyone who steals a site or harvests email addresses does just 
that...

> on our websites. 2.  If we don't have a robots.txt disallowing all
> access, we are giving people a legal right to take our information. 3.

false... lack of a robots.txt file in no way supersedes your copyright 
over your own content... it has no bearing in copyright dispute... 

>  Besides that, the robots.txt physically prevents all web spiders from
> accessing our site. 4.  We should contact search engines and tell them

false... all spiders can ignore them... it's easy to prove, too... there 
are, after all, just text files...

> our keywords...  It might take a bit of following up, but that's what
> I'm for.  (Gods, I can see that email now... Dear Google...) 5.

huh?  that's a new one...  IOW, nope, it'll never work... search 
engines that rely on spiders will *only* index a site, not an email or 
a phone call...  ranking is based on more than keywords anyway...

> Since I'm so difficult, they have found a way to add a NOFOLLOW robots
> meta tag to the front page, so search engines can read that... no, we
> can't take down the robots.txt and put robots meta tags on other
> pages.

ok, that meta tag has now prevented the *good* search engines 
from indexing the site, but the email spam harvesters and other 
things will just ignore it...

where did these guys get their info?

> I don't know how they decided that robots.txt's are a legal issue, but
> I don't think I can convince them otherwise without the name of an
> important person behind it.  Can anyone point me to articles which
> -don't- refer to robots.txt as a security measure, and explain why
> not?  And if anyone has ever seen anything about legal issues
> involving robots.txt, if such even exist, I would really love those
> links.  Also, I'm looking for an article on those email harvesters
> which will use a robots.txt to choose where to index first.
[...]

never even occurred to me to archive this kinda stuff... it's just 
common sense... and experience... and reading the docs...