[thelist] Spam filters

Ken Schaefer ken.schaefer at gmail.com
Tue Sep 14 18:39:11 CDT 2004


Hi,

I disagree with Shawn's conclusions:
- large numbers of corporations send out mail in HTML format
- most people don't realise the dangers of HTML mail, and do not
manually disable this in their mail clients
- most people use a "canned" off-the-shelf spam filtering product.
These do not arbitrarily flag HTML mail as spam (Shawn's filtering
seems to be something he's setup himself, with an anti-HTML mail
bias).

Firstly, I think you need to find out what types of products your
clients are using, and then see if there's something that that
software is flagging as an issue. For example, if you're having
problems with Cloudmark users, it may be because one or more of the
your clients has mistakenly marked your email as spam.

The most important factors that I can see are:
a) your sending IP address is listed in one or more blacklists. I
would rate this by far the most important issue simply because it's
used by so many spam filtering products. Being listed in one or more
of the larger blacklists will mean that clients using a wide array of
spam filtering products will flag your mail

b) your DNS entries have SPF records showing what hosts are legitimate
sending hosts, but the email is coming from somewhere else. Add the
extra host as a legimate source of mail for your domain, or remove
your SPF records

c) Lexical analysis of the message (eg Bayesian Filtering). This will
look for key phrases (and their permutations): Viagra V1agra V!agra
etc, and compare the ratio of these "bad" phrases to all phrases (to
get around this, a lot of spam includes large slabs of legitimate text
at the end), and the proximity of these phrases to one another (so
having a lot of "bad" words close together, followed by a large slab
of legitimate text will still get the mail flagged as spam)

d) Ratio of images to text - large number of images, very little text
= possible spam

e) Size of messages - most spam is quite small (typically less than
4-5kb). Messages that are large (say, >30kb) generally aren't spam.
Messages that include attachments generally arne't spam either (they
could very well be viruses, but spam filtering products are not geared
to fit viruses - AV software does that)

f) Miscellaneous sundry properties:
- the content encoding type of the message (if it's Korean for
example, and you are using an English computer, some spam filtering
products will add a weight to this fact)
- whether the text is in ALL CAPS, like YOU HAVE WON $1,000,000

Cheers
Ken


On Tue, 14 Sep 2004 14:33:41 +0100, Jason Handby <jasonh at corestar.co.uk> wrote:
> Hi all,
> 
> I have been working on a website which allows people to register and then
> search a database according to various criteria. Once the user has finished
> searching, the site emails them a detailed report on their search results
> that they can refer back to. This email is formatted as HTML.
> 
> Recently the client has noticed that they are having more problems with
> emails being blocked by users SPAM filters and not getting through. We know
> that the email is not SPAM, but clearly it looks like SPAM as far as (at
> least some) filters are concerned.
> 
> Do any of you have any general suggestions / guiding principles for creating
> emails that don't look like SPAM? Are HTML emails more likely to cause
> problems than plain-text ones? Does the length of the email matter? Are
> there any key phrases we should be avoiding? (We're not selling Viagra :-) )
> 
> Any experiences / suggestions / links to resources would be gratefully
> received.
> 
> Ta
> 
> Jason


More information about the thelist mailing list