[thelist] Crawler pages: listing the files in a site

Keith Davis cache at dowebs.com
Sun Jun 3 14:48:56 CDT 2001


Tony Page wrote:
> 
> Is there an easy way to construct a crawler page, ie. a webpage containg
> simple links to every (static) page in a website? 

<open_bag>
 <rummage>
 FOUND IT!
 </rummage>
</open_bag>

Hmmm, not exactly...

<tinker>
 fiddle
</tinker>

assumes that (static) means it contains "htm" in file name

place in document root
fix shebang line if needed
chmod to 0755
create & chmod(0666) results file 
edit results_file vars to suit your site
edit nothing else
************************************************

#!/usr/bin/perl

$results_file_name = "site_index.html";
$results_file_url  = "http://some.com/site_index.html";

$U = "a";
$V = 1;
$W = $U.$V;
@things=<*>;
foreach $thing (@things){
if(-f $thing && $thing =~ /htm/){push(@FILES,$thing)}
if(-d $thing){
	$THING=$thing."/";
	@dir=<$THING*>;
	push(@$W, at dir);
	}
}
&cycle;
sub cycle{
	foreach $thing (@$W){
	if(-f $thing && $thing =~ /htm/){push(@FILES,$thing)}
	if(-d $thing){
		$THING=$thing."/";
		@dir=<$THING*>;
		push(@X, at dir);
		}
	}
$x=@X;
  if($x>0){
  ++$V;$W = $U.$V;@$W=@X;@X=();&cycle;
  }
}#cycle

foreach $f (@FILES){
$F = qq~<a href="$f">$f</a><br>\n~;
push(@XFILES,$F);
}

open(F,">$results_file_name");
print F qq~<html><BODY bgcolor=ffffff>
~;
print F qq~@XFILES~;
print F qq~
</body></html>~;

print "Location: $results_file_url\n\n";

*************************************************

keith




More information about the thelist mailing list