[thelist] testing for email address in perl

Anthony Baratta Anthony at Baratta.com
Sun Jan 13 15:17:50 CST 2002


At 10:17 PM 1/12/2002, Adrian Fischer wrote:

>I have a regex around here for a project I did to harvest emails from
>bounce messages. I'll see if I can find it.

Here an early version of an email address harvester (from bounced messages) 
I had laying around.

The key line is: m/(\w[-.\w]+\@[-.\w]+\.\w{2,3})\W/

Hope this helps.

#!/usr/bin/perl

# take the name of the file to process from the cmd line:
my $file = $ARGV[0] || die "no file!\n";

# slurp the file into an array of @lines
open(F, "< $file");
my @lines = <F>;
close F;

# hash to hold resulting addresses
my %addys;

foreach(@lines){
     # The real conditions of this regex is that it must start with a
     # valid word character ('\w'), and the pattern must end with either
     # two or three word characters (e.g., .com or .uk), followed by a
     # non-word character:
     if(m/(\w[-.\w]+\@[-.\w]+\.\w{2,3})\W/){

         # grab the matched part of the expression captured by the first
         # matching set of ()'s:
         my $email = $1;

         # list of things to skip ORd:
         next if $email =~m/ideasystems|daemon|postmaster/i;

         # lowercase the resultant address to avoid duplications:
         $email = lc($email);

         # stick in global hash:
         $addys{$email} = 1;
     }
}

foreach(keys(%addys)){
     print $_ . "\n";
}

---
Anthony Baratta
President
Keyboard Jockeys

"Conformity is the refuge of the unimaginative."





More information about the thelist mailing list