[thelist] browser requests

Robert Vreeland vreeland at studioframework.com
Sat Jan 28 08:48:13 CST 2006


<snip>
Why should you want to do this?
or
What is so bad about scraping?
</snip>
Stefan, my client has work very hard, for years, to build his site content
and has become one of the top sites for his category because of that hard
work. As such, he has no interest in making it easy for others to steal his
content. 
<snip>
IMO,
1. If a 'valid customer' thinks the site is so valuable that he/she wants to
archive it on harddrive, why should I block that? (Think: 
Downlaod before out-of-town-trips, Broadband at work, but want to read at
home, and other scenarios
</snip>
That is an interesting point, I don't think there is any problem with saving
a page to your drive - most browsers provide the functions to do so, and
such functions would not then try to grab content identified in a function
that is never called; that would only happen if the site was being scrapped
or crawled by a script - not a browser. So your valid user would never even
be aware of the blocking.
<snip>
 2. If I block scraping, could I accidentally block search-engines? Or
web-archives?
</snip>
That's what I'm trying to figure out. My guess is no, I don't think the
major search engines are in the habit of parsing scripts embed in the page,
rather that they would ignore them. To boot, the log files do not indicate
search engines as ever requesting the url's that appear in the uncalled
javascript function.
<snip>
3. If somebody _really_ wants to scrape your site, he/she will do so. 
Even regular wget has options for random delays between requests, which
would probably get around such limitations.
</snip>
True, and they can use real eyeball to do it as well, but that doesn't mean
we should not try to stop it. As to the block times, I think at least a half
hour, possible up to a day. To take in account dynamic ip addresses and
scripts spoofing their address.
<snip>
I think the harm that scrape-prevention does weights more than the good it
brings.

But that's just my opinion.

Stefan
</snip>
I don't agree, but that's my opinion.

Robert
--
http://Stefan.Waidele.info
http://LinuxBasics.org
http://Krone-Neuenburg.de
-- 

* * Please support the community that supports you.  * *
http://evolt.org/help_support_evolt/

For unsubscribe and other options, including the Tip Harvester and archives
of thelist go to: http://lists.evolt.org Workers of the Web, evolt ! 




More information about the thelist mailing list