[thelist] Why Does SpamAssassin Hate America? (Or at least just me?)

Martin Burns martin at easyweb.co.uk
Fri Apr 4 05:21:48 CDT 2008


On 3 Apr 2008, at 22:57, Chris Anderson wrote:

>> So, in a world where even some techies are fed up, how do you
>> communicate effective means of blocking spam?  What is the difference
>> between statistically based filtering vs. heuristics?  And how do I  
>> do
>> the later and avoid the former?
>
>
> Statistical spam filters based on Bayesian filters are probably what  
> you
> are looking for.

...and related kinds of algorithms.

> You teach it what is spam and what is not, then it uses what it's  
> learnt
> to filter incoming spam.
>
> Of course this does mean there is some ramp-up time, but after a short
> while it begins to get very good at it!

There are plenty of spam corpuses around, and of course you have your  
existing good mail to use as a ham corpus. Using these, the filter is  
pretty rapidly trainable.

> You can get server-based Bayesian filters, but of course it's no  
> longer
> "personalised",

Depends. Good ones (DSPAM for example) can be set up on a per-user  
basis without too much trouble.

> but instead is based on the "teachings" of other people
> using their client-side filters and is again very good. If a new spam
> format comes out, once enough people get a spam message the server- 
> based
> version will start to know about the new format and filter them.


Also useful: honeypot addresses like yumyum at easyweb.co.uk. Bots can  
harvest this from mailing list archives, and DSPAM trains multiple  
times on anything sent to that address, knowing it's spam. It learns  
new formats pretty quickly :-)

M

--
 > Spammers: Send me email -> yumyum at easyweb.co.uk to train my filter
 > http://dspam.nuclearelephant.com/








More information about the thelist mailing list