[thelist] To Crawl... or Not To Crawl

Paul Silver paul.silver at gmail.com
Mon Jul 17 08:43:52 CDT 2006


On 7/11/06, Rob Smith wrote:
> We've got the Google Mini appliance as our search engine. The paradox
> I'm in is that we want it to crawl and capture all the SKU's of all our
> products. The rub is that the pricing is on the same page as the SKU's.
> We don't want the pricing info to be cached. What I've done to trick out
> the Mini is have a page off our department listing page that goes to a
> page with the SKU's, and without the pricing. Upon entering the page,
> you're redirected to the real page with the pricing. The Mini finished
> recrawling our site last night and had all my trickster pages listed as
> "Info: redirected URL" ... it crawled it successfully, but didn't store
> it on the search index as good URLs to list.
>
> I don't want to create a comprehensive page of all SKU's per product. My
> initial go 'round with that was a bloated 6 MB worth of plain text; not
> web friendly to say the least.
>
> My lame next thought would be to store all SKU's on the same page as the
> initial product listing:
>
> Sunset Photo eSatin Paper 300g <div
> style="font-size;1px;color:white">(3PES851150,3PES8511,...etc.)</div>
>
> {next product in department}

You could use a custom meta tag to store the SKU, then if / when you
need to search on it, use the partialfields or requirefields flags in
the URL to search on them.

Removing the link to the cached version would stop the price being in
there if it's changed - do you really need the cached version to be
there? Presumably people can't buy anything if your site is down.
However, this wouldn't stop the price potentially coming up in the
snippet of the page shown in the search results - I don't know if this
is something you're willing to accept, or whether you'll still want to
hide it.

HTH

Paul
-- 
Paul Silver
Freelance Web Developer
Tel: 07813 654285
http://www.paulsilver.co.uk and http://www.gsadeveloper.com



More information about the thelist mailing list