[thelist] some coding help request!

Burhan Khalid burhankhalid at members.evolt.org
Thu Jul 18 03:44:00 CDT 2002


On Thursday 18 July 2002 01:07, Erick Papadakis wrote:
> my $string = "http://www.cnn.com/2002/fbi.exec.binladen/index.html";
> my $exclusion_list = "index|html|www|http|https|com|net|org";
>
>
> so, all i want is
>
> 1. name of domain, but this could be without www -- "cnn.com", or could
> be something like "robots.cnn.com"
>
>      cnn
>
> 2. words in the string that are not in the exclusion list
>
>     fbi, binladen, exec
>

Hey Erick :

I don't know the exact code in PERL, but I can give you some algorithm to get
you started.

1. Find the first instance of '//' in your target. This will get you to the
beginning of your domain name. This also takes into account other internet
protocols (like ftp:// telnet:// gopher://). The only thing that could break
it would be stuff like file:///C|/ (a file URL).

2. From the location at #1, find the next '/', which will give you the domain
name, and will take into account names like this.is.a.really.long.name.com.

3. Split the target at point found at #2, and run it against your exculsion
list.

Let me know if this *doesn't* have to be PERL -- cause it would be easy to
whip up in PHP.

--
Burhan



More information about the thelist mailing list