[thelist] Robots.txt, the robots meta tag, and copyright referencesneeded.

rudy r937 at interlog.com
Mon Jan 14 22:23:06 CST 2002


> I am in the middle of the most ridiculous battle with our
> main technical department in the history of man.

hi april

see aardvark's reply, he's already covered the salient points

i just wanted to add that when you show "technical" people that they are
wrong, they will resent it, and the relationship will continue to suffer

so this is more a question of politics, which really means the fine art of
getting people to do what you want and have them think it was their idea in
the first place  'o)

>Here is their chain of thought, where they got it I don't know:
>1.  Having a robots.txt prevents people from copying our information
> on our websites.

they're probably thinking of .htaccess

robots.txt can be used to exclude certain user agents, but presumably if
you want to let user agents like netscape navigator and internet explorer
see the page, then there's no way to stop people from copying the
information

see http://www.robotstxt.org/wc/robots.html


>2.  If we don't have a robots.txt disallowing all access, we are giving
>people a legal right to take our information.

feh, that is "so very wrong" as the saying goes

here are two sites to get you started

http://www.hwg.org/resources/?cid=18
ivan hoffman's site is a must-see

http://www.nolo.com/lawcenter/ency/index.cfm/catID/804B85E3-9224-47A9-A7E6B
5BD92AACD48#2EB060FE-5A4B-4D81-883B0E540CC4CB1E
i don't suppose these people ever heard of user-friendly urls -- i mean,
look at the in-page anchor they're using (the string after the #)

and what does "disallow all access" mean?  i cannot believe that's what
your technical people actually said -- this is the same thing as not having
a web site at all!!  take it all down, delete the files, shut off the
server, and don't forget to turn out the lights as you leave, because the
web site is no more and neither you nor the technical people have jobs...

>3.  Besides that, the robots.txt physically prevents all web spiders from
>accessing our site.

again, are you sure your technical folks said that, and it's not just your
interpretation of what they said?

robots that visit your site identify themselves, and if you have your
robots.txt file set up properly, you can exclude certain ones that way --
i.e. the ones which identify themselves honestly

but this does not stop a robot from masquerading as internet explorer, and
you're not going to turn that user agent away, are you? (see comment above
about turning off the lights...)

so it's not "all" robots

>4.  We should contact search engines and tell them our keywords...  It
>might take a bit of following up, but that's what I'm for.  (Gods, I can
>see that email now... Dear Google...)

they probably mean manual submission, and they're right, that is the best
way to get listed, as opposed to waiting for the search engine robots to
stumble across your site, which they're only ever going to do from some
other web site which points to yours (and which, by the way, does not say
NOFOLLOW in its meta tags)

manual submission not only triggers the visit by the search engine robot,
it can also result in the best listing because some search engines let you
supply the description at time of submission -- visit each search engine
and look for the "add your site" page

>5.   Since I'm so difficult, they have found a way to add a NOFOLLOW
> robots meta tag to the front page, so search engines can read that...

"since i'm so difficult" says plenty about this relationship

ask them nicely why search engines should be prevented from indexing other
pages within your web site besides the front page

if the idea is that the technical people only want just the front page
listed in search engines (maybe they think all site visitors have to start
at the front page?), then yeah, that's the way to do it, but that's just
plain stupid, and they need to be (re)educated about web design


rudy







More information about the thelist mailing list