[thelist] site downloading tool

me me at cgiguy.com
Tue Oct 30 17:12:10 CST 2001


>----- Original Message -----
>From: "Marc Seyon" <seyon at delime.com>
>To: <thelist at lists.evolt.org>
>Sent: Tuesday, October 30, 2001 5:47 PM
>Subject: [thelist] site downloading tool
>
>
>G'day all,
>
>I know this has been discussed, but not sure what to search for 
>in the archives. I'm looking for a tool to download an entire 
>site from the net. Yes, I have only honourable intentions :-)

First, what are honorable intentions? It does not sound very fun...

But yes, its called writing some sockets code. May i suggest going to
yahoo and searching for two words "beej socket".
A guy named "beej" has a page on the web that tells you how to write
sockets code. If you are a windows coder, perhaps its not all that 
different. By writing your own sockets code you will find that you
will have complete control over your spidering efforts.

dont worry, its not really that difficult.
I do it, and i am only twelve years old.

good luck.


>//------------------------------------------------------------
>
>Hello all
>
>Is there a way to *prevent* users from downloading all files 
>at once for unhonorable intentions?
>
>TIA, Mike
>
>
>

Again, what exactly are UNhonorable intentions?
But then, that does stink like fun...

Why not serve up your pages via a cgi?

Lets say that your home page is index.cgi

And lets also understand that all of your pages will get served
up via index.cgi...

When the user first runs index.cgi, index.cgi will retrieve the
incoming cgi data. From this data, lets look for a variable called
hitctr. But maybe name it something else so that they dont even know
what it is. And maybe encode it. The first time you run index.cgi
hitctr will be empty. It wont contain anything. Thats a zero. Create
an encoding (or encryption) scheme that only you know. Lets say that
a zero encrypts to @34sdjasdf. Increment that to a one. Ok, now it
encrypts to hitctr=Wats;pas;df. If the crawler does not know what you
are doing, they wont be any the wiser. The best lock is a lock that
nobody even sees. Cant break it if ya cant find it. baboom.

Anyway, with each page call, increment the hitctr and when ever you go
back to the home page perhaps, reset the hitctr to zero. This will allow
normal users to cruise your site without restriction. But anybody that 
displays say ten pages without going back to the home page, maybe just 
redirect them back there anyway.  

Anyway, thats a possible solution.

I really dont understand the specifics of your site and your needs, but
anyway maybe this will help to get you started.

rotsa ruck.

me.








More information about the thelist mailing list