[theforum] Lack of evolt Listmastery (was: Re: [thelist] Tip Harvest for the Week of Monday Oct 09, 2006)

David Kaufman david at gigawatt.com
Sun Oct 22 02:15:22 CDT 2006


Hi Matt,

Matt Warden <mwarden at gmail.com> wrote:
> On 10/17/06, Chris Hayes <chris at lwcdial.net> wrote:
> thelist has been queuing up periodically for a month or two, now. I
> think Dean Mah used to catch this previously. I don't know who is
> handling it now.

That would be me.  Or rather, that would be me, not handling it *well* 
now.  Dean resigned the job of list-owner and spam-shoveler.  After Alan 
and I (remember Alan?) volunteered to take over Dean's day-to-day duties 
we were shocked (and I'm still in awe) of the amount of work Dean did. 
I'm sure I'll never be able to fill his shoes and give evolt the amount 
of daily attention that he did.  I'm doing all that I can.

Yes, the lists do "queue up" periodically.  In fact, mailman hangs 
sometimes, as does qmail leaving hundreds (or thousands) of messages 
piling up in one or another of mailman's or qmail's various queues.  I 
sifted through Dean's .history file to try and glean some of his 
techniques for un-jamming them, but still usually have to resort to 
kill-9ing and restarting something and then waiting, and re-checking 10, 
30 or sometimes 90 minutes later to see if the plumbing is finally, 
slowly unclogging, and then maybe starting over and doing it again 
before the data starts flowing freely again (invoke memory of the last 3 
minutes of the movie "Tron" here, after the Master Control Program is 
destroyed, and the IO tower lights up with the data of all the happy 
programs, free once more.)

Although I used to consider myself a competent Linux admin with a few 
years under my belt successfully administering both mailman and qmail, 
I've been humbled and forced to revisit that world-view after working 
with evolt's lists for just this short time.  The volume of spam and 
virii that we block, bounces that we process... it's pretty intense.  I 
volunteered because I knew (or had heard -- "knowing" experientially is 
a different thing altogether :-)) that this was the case, and wanted the 
on-the-job experience under my belt of having managed a high traffic 
list server.

Problem is (as is so often the case) I haven't had the time to delve as 
deeply as I'd hoped.

I do still hope to reconcile that!  But I do also have -- y'know, the 
job, the commute (now 150 miles, one way, most weeks all 5 days... 
uphill both ways :-)), the kids, the wife and what's left over of the 
life to tend to as well.  So I hesitate to *promise* that service will 
improve Real Soon Now.

Anyone who maybe *does* have the actual high-traffic listmaster chops 
(especially those with real-world mailman + qmail + spamassassin 
crisis-management experience!) and time to spare is certainly encouraged 
to apply for this or any of the other exciting career opportunities at 
evolt industries:  I for one am seeking list co-owners, list helpers, 
list-software/admin tutors, well-wishers and also cheerleaders.  The pay 
is fairly modest (I shouldn't mention the exact salary in public but 
will just say that dividing by it will throw a run-time exception), you 
can set your own hours, and can (well, must) telecommute :-)

So, sarcasm aside, the recent list outages are indeed my bad.  When I 
know about them, I do stay up at night trying to fix them (/glances at 
watch).  The most recent one I became aware of only after its second day 
and after it had already healed (or been fixed by another sysadmin who's 
yet to take credit -- we have I suspect several such humble good 
samaritans) when I miraculously noticed an email from Eric Meyer 
entitled "css-discuss dead (again)" buried under 600 or so spam messages 
in my Gmail account (the now dreaded list-owner address 
david.kaufman at gmail.com).  Others mailed that address too.

I should announce publicly here and now that sending actual mail to that 
address is of late just about the least effective way of getting my 
attention, due to the astonishingly high volume of spam it now gets 
after having become the evolt's public list-owner address.  Gmail 
filters about 500 spams a day to it, and allows through almost as many 
false negatives.  Making matters worse, I can seldom find time to check 
it two consecutive days in a row.  So, when I do, finding real mail in 
that steaming pile is difficult and evening-consuming and I confess to 
having been watching Battlestar Galactica more than paying attention 
while shoveling spam from its inbox on more than one occasion.

This, my personal address david at gigawatt.com and my official evolt inbox 
treasurer at evolt.org, though both also old and well-spammed, are still 
much better methods of contacting me personally since they are delivered 
all the way to my actual IMAP account.  This I aggressively spam-filter 
and access using an actual email client (Gmail is innovative and ajaxxy 
but slow as all hell) at least every day or two.  I even sift through 
the piles of spam that make it through my defenses to these addresses, 
manually checking for important-looking mail.

So, if anyone notices (or even just suspects) that the lists are jammed, 
I'd really appreciate it if you could email me at one of those addresses 
with a big "EVOLT LISTS GONE WONKY" sort of subject, so that I stand a 
snowball's chance in hell of doing something about it before the next 
weekend that I find the time to tend to my evolt chores.  Of course I 
monitor the sysadmin list too but, at least when the lists are jammed 
up, that's likely to be a less than useful channel.

Long term, I think it would be great to write a program that monitors 
mail bandwidth *and* alerts me and the other sysadmins (using some 
spam-free conduit) when obvious anomalies occur:

1. inbound volume exceeding outbound.  Unjammed lists should be just the 
reverse, no?

2. overall traffic lower by some set percentage than some rolling 
periodic average (like 30 or 50% lower usage in the last 12 hours than 
the average 12 hours over the last 30 days?)  I understand that MRTG and 
its many addons and mixins excel at this sort of analysis.

I'd originally planned to write a script (when all you have is perl's 
hammer, everything looks like a nail) to periodically test the lists 
end-to-end by sending a message to one of them and having an automated 
process receiving list mail watch for it, and set off alarm bells if 
it's not seen within a reasonable time.  But then my first actual outage 
affected just theList (and then just css-discuss) while the other 
low-traffic administrative lists continued to function.  So an end-to 
end testbot like that would have to be subscribed to, and contantly 
annoying, *all* the good lists in order to be useful monitoring them, 
which would be a sucky solution and led me to the bandwidth monitoring 
idea.

Anyway grand schemes notwithstanding I apologize to all for the recent 
problems and vow to redouble my efforts to pay attention, and be 
responsible enough to be able to respond when the lists hit these bumps 
in the future.

Thanks,

-dave




More information about the theforum mailing list