[thelist] Email alterations

Daniel J. Cody djc at members.evolt.org
Wed Feb 20 23:34:01 CST 2002


Hey Michael -

Thanks for the compliments, glad you're liking it :)

Stripping out the junk and rewriting the entire message(earlier this
week a content-type rewriting problem that i missed popped up for
example) is difficult. There are a lot of
content-transfer-encoding/content-type combos to be dealt with, but it's
all in the rfc's I guess :)

It's actually *not* very resource hungry. If you'll endulge me for a
moment, we'll examine the lifecycle of an html formatted email going to
thelist..
When an incoming email gets to lists.evolt.org, sendmail checks its
alias file for 'thelist at lists.evolt.org' and see's a program name
instead of an email address, and passes the message to the program. The
program is a little python script that does some low level checking for
things like the correct 'to:' address, make sure its not addressed to
too many people(spam counter), and things of that nature. If it doesn't
like what it sees, it passes it to another program that sends it to
thelist-admin(me).

Assuming everything is kosher to that little checking script, it writes
the file to a queue for the mailing list manager(MLM). The MLM checks
that queue every second for a new entry, and when it finds one, it loads
the email file into memory, and does some more checking to make sure
it's a plain text email. If it's not, a number of re-writes happen to
the message so it becomes plain text, and any attachment are stripped.
Either way, it comes out squeky clean and ready for delivery.

Anyone remember the "I'm just a Bill" cartoon they used to play on
Saturday mornings here in the US that chronicled the lifecycle of a US
law? That was great... :)

So, once we have a nice clean email, a copy gets appended to the digest
file that will get sent out at the end of the day for the folks that are
on the digest. For the rest of us, list specific helpful headers for
list management are written into the header, a quip file that contains a
couple hundred x-evolt header lines is opened, one is selected at random
and put in to the x-evolt: header line. A copy of the message is also
written to the archives in the correct thread. The MLM then polls the
list DB(berkeley style) for people who get every message individually,
pipes their email addresses into the To: in the header and shoots them
off in chunks of 100 to one of the three relay.evolt.org servers that
run Postgres. whew.
Whichever one of the three that gets to deliver the message to you sends
the complete email to your SMTP server. If it bounces for whatever
reason, it gets shot back to another python script that does some
automatic checking(e.g. if this email bounces more than 5 times in 24
hours, unsubscribe that person), and then sent back to me.

Thats just one email :) For thelist alone, thats almost 100
incoming(3000 outgoing) emails a day. Throw in the other evolt lists,
and its about 5000 a day. Time from when it hits lists.evolt.org to the
time it leaves one of the relay servers: 45-60 seconds total(half of
that being queuing). So, to answer your question, it's not server hungry
at all.. Glad you asked? :)

In the same way that multi-part messages are stripped clean, I'm working
on something similar that will strip footers, but thats a bit more
difficult.

If it's not totally apparent, I love talking about this stuff, so shoot
me any other questions if you have them :)

.djc.

Michael Pemberton wrote:
> I am amazed to see how many new features have come to be added to
> thelist in
> recent times.  Great work guys.
>
> I see that it is now possible to strip the txt/plain part out of a mime
> formated message.  How hard / server hungry is this kind of thing?  I was
> wondering, is it also possible to strip out the thelist footer and other
> such footers (hotmail / yahoo for example).
>
> As someone who is limited to using hotmail at work, I understand the
> annoyances of having this appear at the end of each of my posts.




More information about the thelist mailing list