[thesite] bad spiders was: comment searches
Daniel J. Cody
djc at members.evolt.org
Mon Nov 19 14:11:36 CST 2001
ya we do. i sent out a note to this list about which user-agents are
being blocked a while back.
as for spiderts that change their user-agent, oy i've heard a lot of
people talk about that.
if they purposly disobey the robots.txt entry and enter the 'trap'
directory, their user-agent *AND* IP address are blacklisted.
now before someone screams, "Dynamic IP's!!" - think about the massive
amounts of data these things are downloading and you'll realize why
they're more than likely on a broadband or direct connection(whose IP
may change infrequently at best).
if thats not good enough, the first time a client enters the 'trap'
directory, their IP address is automaitcally added to the shitlist.
they'd have to change their IP address for each time they visited the
site, and being as they're likely to be automated tools, this isn't the
case.
in a year of having my email address visible from my member page and
archive, i still only get at most 5 spams a month. your mileage may
vary, but lets not advocate shutting people out, on the *chance* that
our email address' may get siphoned up.
.djc.
.jeff wrote:
>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
>>and we already have mechanisms in place to stop spiderts
>>(to use dan's term).
>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
>>
>
> do we? i'm not aware of anything being setup. however, if it is setup and
> uses the techniques that dan outlines in his "how to stop bad robots"
> article, then we're not really all that protected. the info is only really
> protected from the totally automated bots which aren't a threat anyway
> because they don't register accounts or login. what we're not protected
> from is the harvester that uses something like teleport pro with its user
> agent string set to some recognizable browser user agent string.
More information about the thesite
mailing list