[thelist] SE spidering spectra URL's

Steve Cook sck at biljettpoolen.se
Fri Nov 24 01:19:02 CST 2000


One reason I know of for not indexing "?var=val type URLs", is that the
variables can be used by unscrupulous site owners to "spam" search engines.
A pr0n site for instance can create an endless range of pages, all with
subtle differences and with different querystrings in the URL - from just
one page! Detecting the differences between that and the different results
generated by a legitimate database driven site is almost impossible.

Also, think about what is the difference between
"?page=results1&template=dark" and "?page=results1&template=light"? In
effect they are probably the same page, simply with a different colour
scheme, but who knows? This is a trivial example, but it could lead to the
indexing of huge amounts of redundant information. If you searched for
"database tutorials" in Google and were presented with 200 pages of results
where 80% are the same page from a database generated site sorted in
different ways, you would soon start to think the system was breaking down
:-)

Of course it's quite possible that many of these obstacles could be overcome
with clever indexing techniques, but the nature of some people is to abuse
the system and it's pretty certain that indexing query_strings would lead to
a huge increase in search engine spamming.

Just my 2 öre!

.steve


----------------------------------
   WapWarp - http://wapwarp.com
 Wap-Dev - http://www.wap-dev.net
 Cookstour - http://cookstour.org
----------------------------------

> -----Original Message-----
> From: Raymond K. Camden [mailto:rcamden at allaire.com]
> Sent: den 24 november 2000 04:36
> To: thelist at lists.evolt.org
> Subject: RE: [thelist] SE spidering spectra URL's
<SNIP>
> 
> P.S. Of course, this isn't really a Spectra or ColdFusion 
> problem, it's a
> spider problem. ASP and PHP sites will have the same issues 
> with spiders.
> Does anyone here actually know someone who works for one of 
> these companies?
> I'd love to hear a good reason for NOT indexing ?var=val type URLs,
> especially in this day and age.
>  




More information about the thelist mailing list