[thelist] Crawler pages: listing the files in a site
Keith Davis
cache at dowebs.com
Sun Jun 3 14:48:56 CDT 2001
Tony Page wrote:
>
> Is there an easy way to construct a crawler page, ie. a webpage containg
> simple links to every (static) page in a website?
<open_bag>
<rummage>
FOUND IT!
</rummage>
</open_bag>
Hmmm, not exactly...
<tinker>
fiddle
</tinker>
assumes that (static) means it contains "htm" in file name
place in document root
fix shebang line if needed
chmod to 0755
create & chmod(0666) results file
edit results_file vars to suit your site
edit nothing else
************************************************
#!/usr/bin/perl
$results_file_name = "site_index.html";
$results_file_url = "http://some.com/site_index.html";
$U = "a";
$V = 1;
$W = $U.$V;
@things=<*>;
foreach $thing (@things){
if(-f $thing && $thing =~ /htm/){push(@FILES,$thing)}
if(-d $thing){
$THING=$thing."/";
@dir=<$THING*>;
push(@$W, at dir);
}
}
&cycle;
sub cycle{
foreach $thing (@$W){
if(-f $thing && $thing =~ /htm/){push(@FILES,$thing)}
if(-d $thing){
$THING=$thing."/";
@dir=<$THING*>;
push(@X, at dir);
}
}
$x=@X;
if($x>0){
++$V;$W = $U.$V;@$W=@X;@X=();&cycle;
}
}#cycle
foreach $f (@FILES){
$F = qq~<a href="$f">$f</a><br>\n~;
push(@XFILES,$F);
}
open(F,">$results_file_name");
print F qq~<html><BODY bgcolor=ffffff>
~;
print F qq~@XFILES~;
print F qq~
</body></html>~;
print "Location: $results_file_url\n\n";
*************************************************
keith
More information about the thelist
mailing list