[Sysadmin] downtime

David Kaufman david at gigawatt.com
Mon Dec 7 08:14:53 CST 2009


Hi Dean,

Dean Mah wrote:
>> Well, since the -25 kernel worked a few minutes ago, but panicks now,
>> I'm even more convinced there is a drive issue.
>>
>> We just moved an active mailing list to this this server, so I can't
>> schedule 4 hours of down time. I'll just keep backups on the other drive
>> and if this drive fails we can switch to that one.
>>
>> Thanks, you can close this ticket now.
> 
> Hi David,
> 
> Did you do a disk clone yet?  What is our status going forward?  There
> have been some unrelated problems with the lists' archives that I am
> working through.  If we want to switch back to tempest, let me know
> and I can re-sync the lists back there and do the switch.

I wouldn't use the term "cloned" :-) but I have rsync-ed everything 
interesting (except /dev, /proc and the like) from the primary to the 
secondary drive, and setup a cron job to dump the database to a file and 
then rsync everything daily at 11pm:


   root at tron:~# cat ~/backup.sh
   #!/bin/bash

   mysqldump -u root -pxxxxxxxx \
     --disable-keys --extended-insert --all-databases \
     > /root/mysql-all.sql

   for DIR in etc home local root \
     var/cache/bind var/log var/mail var/spool/cron var/spool/postfix
     do rsync -a /$DIR/ /backup/$DIR/ || exit 1
   done


Feel free to add anything I've missed that would be useful to have in 
the event that the primary drive suddenly went out!

But I doubt the secondary drive can boot... it's never been tested.

Would you feel more comfortable if I go ahead and have them schedule the 
disk maintenance?

I figured: just keep good backups and let the primary drive fail when it 
will.  I'm not confident that they can detect a problem, even when one 
is staring them in the face, and it seemed like planning a disaster is 
as good a way as any to develop and test your disaster recovery plan.

But mostly I just wanted to push forward on this migration and not make 
you have to do everything again a third (or is it fourth, now?) time.

Let me know what you think.

-dave



More information about the Sysadmin mailing list