[theforum] Lack of evolt Listmastery (was: Re: [thelist] Tip Harvest for the Week of Monday Oct 09, 2006)
David Kaufman
david at gigawatt.com
Sun Oct 22 02:15:22 CDT 2006
Hi Matt,
Matt Warden <mwarden at gmail.com> wrote:
> On 10/17/06, Chris Hayes <chris at lwcdial.net> wrote:
> thelist has been queuing up periodically for a month or two, now. I
> think Dean Mah used to catch this previously. I don't know who is
> handling it now.
That would be me. Or rather, that would be me, not handling it *well*
now. Dean resigned the job of list-owner and spam-shoveler. After Alan
and I (remember Alan?) volunteered to take over Dean's day-to-day duties
we were shocked (and I'm still in awe) of the amount of work Dean did.
I'm sure I'll never be able to fill his shoes and give evolt the amount
of daily attention that he did. I'm doing all that I can.
Yes, the lists do "queue up" periodically. In fact, mailman hangs
sometimes, as does qmail leaving hundreds (or thousands) of messages
piling up in one or another of mailman's or qmail's various queues. I
sifted through Dean's .history file to try and glean some of his
techniques for un-jamming them, but still usually have to resort to
kill-9ing and restarting something and then waiting, and re-checking 10,
30 or sometimes 90 minutes later to see if the plumbing is finally,
slowly unclogging, and then maybe starting over and doing it again
before the data starts flowing freely again (invoke memory of the last 3
minutes of the movie "Tron" here, after the Master Control Program is
destroyed, and the IO tower lights up with the data of all the happy
programs, free once more.)
Although I used to consider myself a competent Linux admin with a few
years under my belt successfully administering both mailman and qmail,
I've been humbled and forced to revisit that world-view after working
with evolt's lists for just this short time. The volume of spam and
virii that we block, bounces that we process... it's pretty intense. I
volunteered because I knew (or had heard -- "knowing" experientially is
a different thing altogether :-)) that this was the case, and wanted the
on-the-job experience under my belt of having managed a high traffic
list server.
Problem is (as is so often the case) I haven't had the time to delve as
deeply as I'd hoped.
I do still hope to reconcile that! But I do also have -- y'know, the
job, the commute (now 150 miles, one way, most weeks all 5 days...
uphill both ways :-)), the kids, the wife and what's left over of the
life to tend to as well. So I hesitate to *promise* that service will
improve Real Soon Now.
Anyone who maybe *does* have the actual high-traffic listmaster chops
(especially those with real-world mailman + qmail + spamassassin
crisis-management experience!) and time to spare is certainly encouraged
to apply for this or any of the other exciting career opportunities at
evolt industries: I for one am seeking list co-owners, list helpers,
list-software/admin tutors, well-wishers and also cheerleaders. The pay
is fairly modest (I shouldn't mention the exact salary in public but
will just say that dividing by it will throw a run-time exception), you
can set your own hours, and can (well, must) telecommute :-)
So, sarcasm aside, the recent list outages are indeed my bad. When I
know about them, I do stay up at night trying to fix them (/glances at
watch). The most recent one I became aware of only after its second day
and after it had already healed (or been fixed by another sysadmin who's
yet to take credit -- we have I suspect several such humble good
samaritans) when I miraculously noticed an email from Eric Meyer
entitled "css-discuss dead (again)" buried under 600 or so spam messages
in my Gmail account (the now dreaded list-owner address
david.kaufman at gmail.com). Others mailed that address too.
I should announce publicly here and now that sending actual mail to that
address is of late just about the least effective way of getting my
attention, due to the astonishingly high volume of spam it now gets
after having become the evolt's public list-owner address. Gmail
filters about 500 spams a day to it, and allows through almost as many
false negatives. Making matters worse, I can seldom find time to check
it two consecutive days in a row. So, when I do, finding real mail in
that steaming pile is difficult and evening-consuming and I confess to
having been watching Battlestar Galactica more than paying attention
while shoveling spam from its inbox on more than one occasion.
This, my personal address david at gigawatt.com and my official evolt inbox
treasurer at evolt.org, though both also old and well-spammed, are still
much better methods of contacting me personally since they are delivered
all the way to my actual IMAP account. This I aggressively spam-filter
and access using an actual email client (Gmail is innovative and ajaxxy
but slow as all hell) at least every day or two. I even sift through
the piles of spam that make it through my defenses to these addresses,
manually checking for important-looking mail.
So, if anyone notices (or even just suspects) that the lists are jammed,
I'd really appreciate it if you could email me at one of those addresses
with a big "EVOLT LISTS GONE WONKY" sort of subject, so that I stand a
snowball's chance in hell of doing something about it before the next
weekend that I find the time to tend to my evolt chores. Of course I
monitor the sysadmin list too but, at least when the lists are jammed
up, that's likely to be a less than useful channel.
Long term, I think it would be great to write a program that monitors
mail bandwidth *and* alerts me and the other sysadmins (using some
spam-free conduit) when obvious anomalies occur:
1. inbound volume exceeding outbound. Unjammed lists should be just the
reverse, no?
2. overall traffic lower by some set percentage than some rolling
periodic average (like 30 or 50% lower usage in the last 12 hours than
the average 12 hours over the last 30 days?) I understand that MRTG and
its many addons and mixins excel at this sort of analysis.
I'd originally planned to write a script (when all you have is perl's
hammer, everything looks like a nail) to periodically test the lists
end-to-end by sending a message to one of them and having an automated
process receiving list mail watch for it, and set off alarm bells if
it's not seen within a reasonable time. But then my first actual outage
affected just theList (and then just css-discuss) while the other
low-traffic administrative lists continued to function. So an end-to
end testbot like that would have to be subscribed to, and contantly
annoying, *all* the good lists in order to be useful monitoring them,
which would be a sucky solution and led me to the bandwidth monitoring
idea.
Anyway grand schemes notwithstanding I apologize to all for the recent
problems and vow to redouble my efforts to pay attention, and be
responsible enough to be able to respond when the lists hit these bumps
in the future.
Thanks,
-dave
More information about the theforum
mailing list