[thelist] testing for email address in perl
Anthony Baratta
Anthony at Baratta.com
Sun Jan 13 15:17:50 CST 2002
At 10:17 PM 1/12/2002, Adrian Fischer wrote:
>I have a regex around here for a project I did to harvest emails from
>bounce messages. I'll see if I can find it.
Here an early version of an email address harvester (from bounced messages)
I had laying around.
The key line is: m/(\w[-.\w]+\@[-.\w]+\.\w{2,3})\W/
Hope this helps.
#!/usr/bin/perl
# take the name of the file to process from the cmd line:
my $file = $ARGV[0] || die "no file!\n";
# slurp the file into an array of @lines
open(F, "< $file");
my @lines = <F>;
close F;
# hash to hold resulting addresses
my %addys;
foreach(@lines){
# The real conditions of this regex is that it must start with a
# valid word character ('\w'), and the pattern must end with either
# two or three word characters (e.g., .com or .uk), followed by a
# non-word character:
if(m/(\w[-.\w]+\@[-.\w]+\.\w{2,3})\W/){
# grab the matched part of the expression captured by the first
# matching set of ()'s:
my $email = $1;
# list of things to skip ORd:
next if $email =~m/ideasystems|daemon|postmaster/i;
# lowercase the resultant address to avoid duplications:
$email = lc($email);
# stick in global hash:
$addys{$email} = 1;
}
}
foreach(keys(%addys)){
print $_ . "\n";
}
---
Anthony Baratta
President
Keyboard Jockeys
"Conformity is the refuge of the unimaginative."
More information about the thelist
mailing list