[thesite] bad spiders was: comment searches

Daniel J. Cody djc at members.evolt.org
Mon Nov 19 14:11:36 CST 2001


ya we do. i sent out a note to this list about which user-agents are 
being blocked a while back.

as for spiderts that change their user-agent, oy i've heard a lot of 
people talk about that.

if they purposly disobey the robots.txt entry and enter the 'trap' 
directory, their user-agent *AND* IP address are blacklisted.

now before someone screams, "Dynamic IP's!!" - think about the massive 
amounts of data these things are downloading and you'll realize why 
they're more than likely on a broadband or direct connection(whose IP 
may change infrequently at best).

if thats not good enough, the first time a client enters the 'trap' 
directory, their IP address is automaitcally added to the shitlist. 
they'd have to change their IP address for each time they visited the 
site, and being as they're likely to be automated tools, this isn't the 
case.

in a year of having my email address visible from my member page and 
archive, i still only get at most 5 spams a month. your mileage may 
vary, but lets not advocate shutting people out, on the *chance* that 
our email address' may get siphoned up.

.djc.

.jeff wrote:


>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
>>and we already have mechanisms in place to stop spiderts
>>(to use dan's term).
>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
>>
> 
> do we?  i'm not aware of anything being setup.  however, if it is setup and
> uses the techniques that dan outlines in his "how to stop bad robots"
> article, then we're not really all that protected.  the info is only really
> protected from the totally automated bots which aren't a threat anyway
> because they don't register accounts or login.  what we're not protected
> from is the harvester that uses something like teleport pro with its user
> agent string set to some recognizable browser user agent string.






More information about the thesite mailing list