[thelist] some coding help request!

Carlos Costa Portela ccosta at servidores.net
Thu Jul 18 06:56:01 CDT 2002


On Thu, 18 Jul 2002, Erick Papadakis wrote:
> 1. the domain name
> 2. all the words in the string (non-symbols, and non-numbers, only
> characters) which are NOT from an exclusion list.

	Perhaps something like this would be interesting to you.

	Hope this helps,
		Carlos.


#!/usr/bin/perl

my $string = "http://www.cnn.com/2002/fbi.exec.binladen/index.html";
my $exclusion_list = "index|html|www|http|https|com|net|org";

(@excl) = split(/\|/,$exclusion_list);
if ($string =~ /http:\/\/([^\/]+)\/(.+)/) {
        $domain = $1;
        $out = $2;
        $out =~ s/[^\w+]/\./g;
        (@words) = split(/\./,$out);
        print $out;
        print "\n";
        for ($i=0;$i<=$#words;$i++) {
                $is_here = 0;
                for ($j=0;$j<=$#words;$j++) {
                        if ($words[$i] eq $excl[$j]) {
                                $is_here = 1;
                                last;
                        }
                }
                if (!$is_here) {
                        push(@result,$words[$i]);
                }
        }

}

for ($i=0;$i<=$#result;$i++) {
        print "$i.- $result[$i]\n";
}


 _______Carlos Costa Portela_________________________________________________
| e-mail:  ccosta at servidores.net | home page: http://casa.ccp.servidores.net |
|_____Tódalas persoas maiores foron nenos antes, pero poucas se lembran______|




More information about the thelist mailing list