Friday, December 18, 2009

Recovering From a Bad Kernel Upgrade on QEMU PowerPC

A recent comment prompted me to launch my QEMU hosted PowerPC virtual machine. Once launched, I couldn't just shut it down, I had an unrelenting urge to upgrade it.

I launched aptitude as root, hit u to update the packages list. Later, I hit SHIFT-u to mark all upgradable packages for installation, hit g, reviewed the list of actions that aptitude was about to perform, and hit g again, in order to actually start the upgrade.

That's a routine ritual that I perform almost everyday on my Debian/Testing laptop, usually with no ill effects, neither to my home computer, nor to my questionable sanity. It pretty much just works.

Now, my PowerPC VM runs Debian/Stable - I expected no problem at all. Nevertheless, sh*t did happen.

The Kernel was also upgraded in the process, which triggered an update of the initrd image. Nothing unusual.

I rebooted the VM and eventually got the login prompt. I typed root as the username, and my password at the subsequent password prompt, and got "Login incorrect".

OK, so I mistyped the password, nothing unusual here. I tried it again. But then something quite unusual happened: when I typed root I got this string echoed back at me: ^]r^]o^]o^]t, and I couldn't login.

I switched to the next console with ALT-right arrow and tried it again, with exactly the same results. I couldn't login.

Now, since there's no SSH daemon running on this VM, I had no other way of logging in.

I couldn't find a relevant bug report on the Debian BTS. I hate it when this happens.

The solution to my problem was obvious: downgrade the Kernel. But how? after all, I had to login first.

Normally, on an x86 PC or VM running Debian, with GRUB as the boot-loader, you still have the option to boot into the previous Kernel, until it is explicitly uninstalled. With a PowerPC VM running Quik as the boot-loader, there's no such option. What a PITA.

There's always the option to discard the disk image, and start over with a fresh install - but I did customize the disk image enough to make it rather painful.

I decided to attempt to boot the VM from an old image of the Debian PowerPC installation CD that was lying around on my hard disk, and see if it gets me anywhere:
qemu-system-ppc debian_lenny_powerpc_small.qcow -cdrom debian-501-powerpcinst.iso -boot d
I tried it twice before I figured what I had to do in order to mount the disk image and attept to repair it:
  1. hit ENTER at the boot prompt to start the Debian installer
  2. select the defaults, continue until you reach the hardware detection stage, and wait
  3. when prompted to configure the host name, go back until you get a menu with an option to detect disks, select it and wait
  4. go back until you get the menu with an option to open a shell - select it
  5. type the following at the shell prompt:
    mount /dev/hda2 /media
    cd media
The /dev/hda2 partition happens to be the Linux boot partition on this VM. I inspected its contents and found that while there was only a single Kernel image there, there were two initrd images - both with the same name, but one of them with a .bak file extension.

This was, apparently, a backup copy of the old initrd image, made during the previous upgrade. I had a hunch that the old initrd image might still work with the new Kernel, because it had the same version number. I hoped that my problem was with the initrd image, rather than with the new Kernel itself.

I swapped the images
mv initrd.img-2.6.26-2-powerpc.bak initrd.img-2.6.26-2-powerpc.good
mv initrd.img-2.6.26-2-powerpc initrd.img-2.6.26-2-powerpc.bak
mv initrd.img-2.6.26-2-powerpc.good initrd.img-2.6.26-2-powerpc
cd /
sync
umount /media
shutdown the VM and started it again without the install CD.

It actually worked: I managed to login!

Wow.

The next thing to do was to downgrade the Kernel:
  1. launch aptitude
  2. select to install the older Kernel version (luckily, it was still available in the packages list)
  3. install it
  4. select the newest Kernel version (the one that caused all this grief)
  5. forbid it by pressing SHIFT-f, so that next time I don't upgrade to this specific version by mistake
I rebooted the VM and found that I could still login.

I guess I should've investigated this further and submitted a proper bug report, but running QEMU on my slow box is such a bucket of pain, that I'd rather avoid any more of it.

No comments:

Post a Comment