[thelist] mod_perl mirroring proxy help

Sam sam at sam-i-am.com
Tue Jan 14 12:56:01 CST 2003


hi list,
I'm having another stab at an old problem: how to create a local mirror
of any given site/page I browse. It should be browser-independant, and
preserve directory structure and file format. I like the idea of
mirroring in real-time, so I can get just the pages and files I actively
choose by browsing, with confidence that what I'm mirroring is exactly
what I'm seeing.
And I'd like to tweak and repurpose it for specific tasks such as
page-weight logging, etc.
I tend to do these things in perl. I want it in perl :)

My questions -- I can see this is going to be a long email --  are: am I
missing a much easier way to accomplish this (in perl and or apache,
mod_perl). I don't really understand how to use mod_proxy (having read
the docs), which would seem to accomplish most of what I want.
And how do I handle ssh (https "requests").

My latest implementation is a mod_perl proxy server, so in any
client/browser I just tell it to use localhost:8800 or whatever, and all
requests and responses get routed through a perlHandler I set up in
httpd.conf. This works beautifully most of the time. The best thing is
that I can be sitting on a testing station and create a mirror of what
I'm seeing, without installing anything new, or having physical acccess
to my workstation. Browsing is slowed a little but not terribly. But I
can't mirror SSL content, and I'm still figuring out the best way to
treat sessions and other pages with dynamic content -- any thoughts on a
good scheme for translating urls like
http://hostname.com/bin/search?PHONE=9&param2=4258291232159&name=Some
Name&query=muppet#
into a useable filename for the local mirror? (this is anticipating a
need to parse the downloaded files for urls and fix links so they'll
work locally or at least on a new virtualhost)

I have this is my httpd.conf:
# handle requests on this port with my Mirror.pm package.
PerlModule Apache::Tests::Mirror
PerlTransHandler Apache::Tests::Mirror

Mirror.pm is based on AdBlocker by Lincoln Stein. I've just replaced the
ad-blocking stuff with my local mirroring code:

http://www.sam-i-am.com/temp/Mirror.pm.txt

Maybe this post belongs on a perl/mod_perl list or newsgroup, but I
don't really follow any at the moment. Can the good people of evolt help
with suggestions, corrections, other ideas?

thanks
Sam
--
sam at sam-i-am.com		#	"my shoe is off
http://sam-i-am.com		#	 my foot is cold"



More information about the thelist mailing list