[theforum] [was-thelist] evolt.org down?

Jeff Howden jeff at jeffhowden.com
Tue Apr 29 08:09:15 CDT 2003


> From: Martin
> Firstly thankyou for putting in the work to getting weo
> (and presumably all the other sites you host) back up
> and running.

yes, a few, some of which have ecommerce.

one client called me before the first warning hit my pager.

> One 20second email would have saved
> a) Lots of people getting worried
> b) Your time in responding to this thread at 5am

if people are worried, an email to me would've gotten some information out

i'll do my best to email when a problem comes up in the future, but i can't
promise i'll remember.

when i first noticed the problem it just seemed like a "get some alternate
power to the servers" sort of problem, a 5 minute solution plus the 10
minute to drive to the colo.  then, it turned out to be an "ok, power is on,
why are 50-100% of the packets being dropped" problem which seemed like
another 10-15 minute fix.  once that was addressed, it was an "oh, cfserver
is puking" and by that time i was deep in problem solving mode and 100%
aimed at getting things fixed and back up as quickly as possible.

> The most stressful ones (and the ones where the people
> working on it got most criticised) were the ones where
> no-one else knew what was going on, or when things would
> be resolved (or at least the scale of work involved).

the street goes both ways.  if people are worried, fire off a little email
asking if everything is ok or if something is going on.  that'll serve
nicely as a reminder to me to give a status update.  then, someone else can
act as my liaison to everyone else so i can continue to keep working.

> There's a direct correlation there. No info=no
> confidence=stress=criticism (deserved or otherwise). And
> you've been around this organisation long enough to know
> that - it goes with the territory of hosting evolt.

actually, around here it's more like attitude=no confidence=criticism.  it
really has nothing to do with whether or not there's info being shared.  and
we've both been around long enough to know that's how it really is.

> This - after all - is what sysadmin is supposed to be
> about: giving the rest of the organisation confidence
> that any unplanned downtime is being handled. How it's
> being done needn't generally get out of sysadmin, but
> sysadmin *must* be part of it, and the rest of the org
> *must* be kept informed.

i don't think i've gotten a single thing from sysadmin stating how various
services we'd planned on continuing to support were being addressed
post-move.  there are still lots of things with regard to lists,
lists.evolt.org website, etc. that still don't work.  kudos to dean for
busting his ass to get as much done as he did (and will for the little bit
of work that beo needed), but i don't see anyone else being accountable for
work that needs to get done.

where's a status report of outstanding issues, proposed solutions to those
issues, and timelines?

> > btw, it's now 5am here.   i have a very important appt
> > at 8am that can't be missed
> Is there any way to ensure that the sysadmin team could
> share the burden, so handling future downtime doesn't
> mean <3hrs sleep for you?

actually, there isn't really much point in me trying to get any sleep before
this appt.  i'll just take the day off work and get some sleep after the

since evolt.org isn't the only thing on my server, i'm hesitant to just
start handing out login accounts for it.  however, if anyone from sysadmin
wants to approach me and show that they have some win2k server admin
experience, i'll definitely take it under consideration.

if no one does, well, i'll understand.  i don't want this to seem like i'm
complaining about having to work on this all night.  i guess i'm just trying
to figure out why a simple thank you is so difficult.

the irony is that this downtime didn't cost evolt.org a dime (kinda like the
hosting), yet evolt.org is the only client on the server that had any
complaints about the downtime or how it was handled.  the client on the
server that was losing possible sales during this downtime called to ask if
the server was down and then thanked me when they noticed it was back up
(and they pay a healthy amount of hosting each month).  they didn't have a
single negative thing to say.  they know i gave up sleep to make sure things
got back up as soon as possible.  like i said, how ironic.



NOTICE:  members.evolt.org web and email address are changing!
| OLD:                            | NEW:                            |
| jeff at members.evolt.org          | evolt at jeffhowden.com            |
| http://members.evolt.org/jeff/  | http://evolt.jeffhowden.com/    |

