[thelist] Why Does SpamAssassin Hate America? (Or at least just me?)
Martin Burns
martin at easyweb.co.uk
Fri Apr 4 05:54:10 CDT 2008
On 3 Apr 2008, at 22:31, JS Bracher wrote:
> So, in a world where even some techies are fed up, how do you
> communicate effective means of blocking spam? What is the difference
> between statistically based filtering vs. heuristics? And how do I do
> the later and avoid the former?
Chris already explained a bit about statistical filtering.
Essentially, the system breaks the mail down into small units, and
assigns each unit a probability of spamminess. When a new mail comes
in, the system looks for units it recognises, and totals the spaminess
probabilities. If it goes over a specified probability - some allow
the user to specify the sensitivity - it classifies the entire mail as
spam to be dealt with as necessary.
When you manually (re)classify a mail, the spamminess probability of
units found in that mail are updated. Over time, the accuracy improves
as the system is better able to understand what *you* feel to be spam.
This makes is *really* hard for spammers to game, as the effective
rulesets are individual and evolving.
Heuristics on the other hand are manually defined rules such as:
"if mail contains 'viagra' or 'v14gr4' or [etc] then it's more likely
to be spam, unless it's from my dad when it's likely to be a joke"
"if mail contains ebay and doesn't contain my user name then it's spam"
"if mail contains ebay and comes from a non-ebay server then it's spam"
Here you have to work out what spam means to you and define logical
rules to cover all eventualities. This is nigh-on impossible for an
individual with a day job to do, so most people use publically
available rulessets. Trouble is, so do the spammers, and they actively
test against those rulesets. And I've even seen spam that forges
SpamAssassin mail headers
Lots more useful reading at
http://www.paulgraham.com/antispam.html
particularly
http://www.paulgraham.com/spam.html
Other good stuff known to be helpful, particularly against botnets:
http://en.wikipedia.org/wiki/Greylisting
http://en.wikipedia.org/wiki/Sender_Policy_Framework
If you are really, really, *really* sure you're not going to get valid
mails from these countries, blocking the entire netblock from making
TCP connections also helps:
* China
* Korea
* Brazil
http://www.fadden.com/techmisc/asian-spam.htm
http://www.easyweb.co.uk/Members/martin/blog/blog_post.2005-01-10.5805268191
http://www.ipdeny.com/
However, nearly all newly proposed solutions fail dismally. See
http://craphound.com/spamsolutions.txt
for a generic response to wide eyed evangelists.
Cheers
Martin
--
> Spammers: Send me email -> yumyum at easyweb.co.uk to train my filter
> http://dspam.nuclearelephant.com/
More information about the thelist
mailing list