[thelist] some coding help request!
Burhan Khalid
burhankhalid at members.evolt.org
Thu Jul 18 03:44:00 CDT 2002
On Thursday 18 July 2002 01:07, Erick Papadakis wrote:
> my $string = "http://www.cnn.com/2002/fbi.exec.binladen/index.html";
> my $exclusion_list = "index|html|www|http|https|com|net|org";
>
>
> so, all i want is
>
> 1. name of domain, but this could be without www -- "cnn.com", or could
> be something like "robots.cnn.com"
>
> cnn
>
> 2. words in the string that are not in the exclusion list
>
> fbi, binladen, exec
>
Hey Erick :
I don't know the exact code in PERL, but I can give you some algorithm to get
you started.
1. Find the first instance of '//' in your target. This will get you to the
beginning of your domain name. This also takes into account other internet
protocols (like ftp:// telnet:// gopher://). The only thing that could break
it would be stuff like file:///C|/ (a file URL).
2. From the location at #1, find the next '/', which will give you the domain
name, and will take into account names like this.is.a.really.long.name.com.
3. Split the target at point found at #2, and run it against your exculsion
list.
Let me know if this *doesn't* have to be PERL -- cause it would be easy to
whip up in PHP.
--
Burhan
More information about the thelist
mailing list