[thelist] Why Does SpamAssassin Hate America? (Or at least just me?)

Martin Burns martin at easyweb.co.uk
Fri Apr 4 05:54:10 CDT 2008


On 3 Apr 2008, at 22:31, JS Bracher wrote:

> So, in a world where even some techies are fed up, how do you
> communicate effective means of blocking spam?  What is the difference
> between statistically based filtering vs. heuristics?  And how do I do
> the later and avoid the former?

Chris already explained a bit about statistical filtering.  
Essentially, the system breaks the mail down into small units, and  
assigns each unit a probability of spamminess. When a new mail comes  
in, the system looks for units it recognises, and totals the spaminess  
probabilities. If it goes over a specified probability - some allow  
the user to specify the sensitivity - it classifies the entire mail as  
spam to be dealt with as necessary.

When you manually (re)classify a mail, the spamminess probability of  
units found in that mail are updated. Over time, the accuracy improves  
as the system is better able to understand what *you* feel to be spam.  
This makes is *really* hard for spammers to game, as the effective  
rulesets are individual and evolving.

Heuristics on the other hand are manually defined rules such as:
"if mail contains 'viagra' or 'v14gr4' or [etc] then it's more likely  
to be spam, unless it's from my dad when it's likely to be a joke"
"if mail contains ebay and doesn't contain my user name then it's spam"
"if mail contains ebay and comes from a non-ebay server then it's spam"

Here you have to work out what spam means to you and define logical  
rules to cover all eventualities. This is nigh-on impossible for an  
individual with a day job to do, so most people use publically  
available rulessets. Trouble is, so do the spammers, and they actively  
test against those rulesets. And I've even seen spam that forges  
SpamAssassin mail headers

Lots more useful reading at
http://www.paulgraham.com/antispam.html
particularly
http://www.paulgraham.com/spam.html

Other good stuff known to be helpful, particularly against botnets:
http://en.wikipedia.org/wiki/Greylisting
http://en.wikipedia.org/wiki/Sender_Policy_Framework

If you are really, really, *really* sure you're not going to get valid  
mails from these countries, blocking the entire netblock from making  
TCP connections also helps:
* China
* Korea
* Brazil
http://www.fadden.com/techmisc/asian-spam.htm
http://www.easyweb.co.uk/Members/martin/blog/blog_post.2005-01-10.5805268191
http://www.ipdeny.com/

However, nearly all newly proposed solutions fail dismally. See
http://craphound.com/spamsolutions.txt
for a generic response to wide eyed evangelists.

Cheers
Martin
--
 > Spammers: Send me email -> yumyum at easyweb.co.uk to train my filter
 > http://dspam.nuclearelephant.com/








More information about the thelist mailing list