[thelist] Re: Spider attack

BMP microme_2000 at yahoo.com
Fri Mar 18 20:04:32 CST 2005


Thanks for letting me know that Mozilla Gecko was not a spider. That helped. Now I can recognize what is and is not a spider. The line at the end of this message is copied from my log file and shows how the Yahoo Slurp spider follows a certain subdirectory order. I am not sure how  it got this order. There were over 8000 lines similar to these on my web log yesterday just by Slurp alone. Along with others, it consumed 520MB throughput, on a site that is total about 22MB in size! 
 
The same happens with MSN, and other spiders.  I am wondering how to configure my site so that this kind of  inter-directory spidering doesn't happen. Could that be the problem? At present  I have about a dozen or so sub-directories, and try to put the same  column of navigation links to the rest of the site on the pages of each sub-directory. Is this not a good idea? Any suggestions about the secrets of web design that optimizes spidering would be appreciated.
 
Here is the example of a spider line  that the web log showed.  There were over 8000 of these with different combinations and permutations.
 
"GET /concept/Nature/Articles/Gallery/Aesthetics/Philosophy/PhenText/Philosophy/Nature/Aesthetics/ HTTP/1.0" 200 43025 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"


		
---------------------------------
Do you Yahoo!?
 Make Yahoo! your home page   


More information about the thelist mailing list