[thelist] some coding help request!
Carlos Costa Portela
ccosta at servidores.net
Thu Jul 18 06:56:01 CDT 2002
On Thu, 18 Jul 2002, Erick Papadakis wrote:
> 1. the domain name
> 2. all the words in the string (non-symbols, and non-numbers, only
> characters) which are NOT from an exclusion list.
Perhaps something like this would be interesting to you.
Hope this helps,
Carlos.
#!/usr/bin/perl
my $string = "http://www.cnn.com/2002/fbi.exec.binladen/index.html";
my $exclusion_list = "index|html|www|http|https|com|net|org";
(@excl) = split(/\|/,$exclusion_list);
if ($string =~ /http:\/\/([^\/]+)\/(.+)/) {
$domain = $1;
$out = $2;
$out =~ s/[^\w+]/\./g;
(@words) = split(/\./,$out);
print $out;
print "\n";
for ($i=0;$i<=$#words;$i++) {
$is_here = 0;
for ($j=0;$j<=$#words;$j++) {
if ($words[$i] eq $excl[$j]) {
$is_here = 1;
last;
}
}
if (!$is_here) {
push(@result,$words[$i]);
}
}
}
for ($i=0;$i<=$#result;$i++) {
print "$i.- $result[$i]\n";
}
_______Carlos Costa Portela_________________________________________________
| e-mail: ccosta at servidores.net | home page: http://casa.ccp.servidores.net |
|_____Tódalas persoas maiores foron nenos antes, pero poucas se lembran______|
More information about the thelist
mailing list