[Sysadmin] downtime

David Kaufman david at gigawatt.com
Sun Dec 6 23:26:18 CST 2009


There was a bit of downtime this evening on tron as I upgraded the 
kernel and it did not reboot.

Support ticket follows:


David Kaufman 	12/6/2009 10:23:38 PM
Hardware Object: evoltorg (67.19.208.10)

I upgraded the kernel on this machine but it did not come back online
after the reboot.

Please take a look at the console for me, might be a grub error. You
should see both the old (2.6.24-24) and new (2.6.24-26) kernels in the
list.

Please try booting the new -26 kernel and if that fails: let me know
what error message you get and then reboot the old -24 kernel.

Obviously if you're able and willing to fix the problem with the -26
config, feel free :-)

Thanks!

-dave


Scott C. 	12/6/2009 10:28:25 PM

We are looking into this for you now.


Scott C. 	12/6/2009 10:40:06 PM

Problem: The server did not come back online after your reboot.

Error: As soon as grub started to load the new kernel it had a kernel
panic.

"run-init: /sbin/init: No such file or directory.
[ 68.095605] kernel panic = nopt syncing: Attempting to kill init!"

Action: I had to hard reset the server. When the grub came back up it
would only boot to another kernel. I chose the second latest one to boot
from.

Result: The server is now back online and accessible remotely.


David Kaufman 	12/6/2009 10:53:58 PM

Thanks Scott,

I see it's back up running the -25 kernel

What do you mean when you say "When the grub came back up it would only
boot to another kernel" that:

1. the -26 kernel was not shown in the grub menu? or that
2. you selected the -26 kernel from the grub menu but it panicked again?
3. or something else?

The error "/sbin/init: No such file or directory" means that it was
unable to mount the root partition. I've suspected a disk problem on
the primary drive for a while now (see the other ticket that i just
asked to be closed).

Could you try the reboot one more time, and see if the -26 kernel will
boot this time?

Thanks


Scott C. 	12/6/2009 10:57:29 PM

Please stand by while we look into this.


Scott C. 	12/6/2009 11:08:00 PM

What was meant earlier is that, the default kernel -26 would kernel
panic when attempting to boot.

The server is also kernel panicking at the same place with kernel -25.
It had to be booted to the -24 kernel this time.

Since you suspect hard drive issues, we can schedule hard drive
diagnostics for you. Please let us know a date and time you would like
to this to performed. We require a four hour window of time for this to
be performed and the server needs to be offline.


David Kaufman  	 12/6/2009 11:17:01 PM

Well, since the -25 kernel worked a few minutes ago, but panicks now,
I'm even more convinced there is a drive issue.

We just moved an active mailing list to this this server, so I can't
schedule 4 hours of down time. I'll just keep backups on the other drive
and if this drive fails we can switch to that one.

Thanks, you can close this ticket now.



More information about the Sysadmin mailing list