Friday, December 25, 2009

Backup Revisited: Disk Full

The nightly Bacula backup failed a few nights ago. The reason was simple - the external backup disk was full.

A full backup weighs close to 30GB (and it's steadily growing). A differential/incremental backup weighs about 500MB, on average. So, with a file retention period of 4 months, the storage space needed for backup is about 4×(30+30×0.5)=180GB.

I've configured Bacula's maximum volume size to 4GB (do read the fine manual). This means that it'll divide the backup archive into chunks of no more than 4GB in size. This allows Bacula to recycle volumes when their contents is not needed anymore, i.e. if all their contents is older than the retention period.

I've also configured Bacula to use separate pools of volumes for the monthly full backup jobs and for the nightly incremental/differential backup jobs. It seemed like a good idea at the time. It wasn't.

Bacula does not recycle volumes before it actually needs them. This means that I ended up with left over volumes on disk, that are not needed anymore, that would only be recycled on the next backup. And since I separated the volumes into two pools per client, the full backup leftover volumes remained on disk for a month, were then recycled, and replaced by other, more recent, leftover volumes. The overhead is about 1 volume per full backup, and for two clients it amounts to 8GB.

Furthermore, I use the same disk to store the VirtualBox disk image of my virtual WinXP PC. That's about 15GB.

The disk capacity is 230GB, but 1 percent of this disk is used by the OS - that's 2.3GB down the drain.

That leaves me with close to 25GB of slack. Which doesn't seem too bad, but it's actually pretty bad. The problem is that Bacula, by default, will perform a full backup whenever it detects that the fileset, i.e. the list of files/directories that's included/excluded in each backup job, has been modified. And, as you can imagine, I did just that, at least once, during the past few months.

I've had to reconfigure Bacula as follows:
  1. use a single backup pool per client (I could merge both to a single pool, but it seems to me that keeping clients volumes separate is a more robust approach) - this should reduce the recycling overhead, because I expect volumes to be recycled more often now
  2. reduce the volume size to 700MB, in an attempt to lower the leftover overhead even more, by lowering the chance that a volume contains files from different backup jobs (another, more accurate, approach is to set the Maximum Volume Jobs to 1)
  3. reduce the retention period to 2 months (actually, I've never had to restore files older than a week or so, but... better safe than sorry)

I stopped the Bacula director daemon
invoke-rc.d bacula-director stop
erased all the backup volumes, reset the Bacula database (aka the catalog)
/usr/share/bacula-director/make_sqlite3_tables
(yes, I'm using the SQLite3 backend), started the director daemon, and then used bconsole to manually launch backup jobs for both my wife's PC and my own.

Planning ahead is a good idea. It's only that I realized this fact too late.

Friday, December 18, 2009

Recovering From a Bad Kernel Upgrade on QEMU PowerPC

A recent comment prompted me to launch my QEMU hosted PowerPC virtual machine. Once launched, I couldn't just shut it down, I had an unrelenting urge to upgrade it.

I launched aptitude as root, hit u to update the packages list. Later, I hit SHIFT-u to mark all upgradable packages for installation, hit g, reviewed the list of actions that aptitude was about to perform, and hit g again, in order to actually start the upgrade.

That's a routine ritual that I perform almost everyday on my Debian/Testing laptop, usually with no ill effects, neither to my home computer, nor to my questionable sanity. It pretty much just works.

Now, my PowerPC VM runs Debian/Stable - I expected no problem at all. Nevertheless, sh*t did happen.

The Kernel was also upgraded in the process, which triggered an update of the initrd image. Nothing unusual.

I rebooted the VM and eventually got the login prompt. I typed root as the username, and my password at the subsequent password prompt, and got "Login incorrect".

OK, so I mistyped the password, nothing unusual here. I tried it again. But then something quite unusual happened: when I typed root I got this string echoed back at me: ^]r^]o^]o^]t, and I couldn't login.

I switched to the next console with ALT-right arrow and tried it again, with exactly the same results. I couldn't login.

Now, since there's no SSH daemon running on this VM, I had no other way of logging in.

I couldn't find a relevant bug report on the Debian BTS. I hate it when this happens.

The solution to my problem was obvious: downgrade the Kernel. But how? after all, I had to login first.

Normally, on an x86 PC or VM running Debian, with GRUB as the boot-loader, you still have the option to boot into the previous Kernel, until it is explicitly uninstalled. With a PowerPC VM running Quik as the boot-loader, there's no such option. What a PITA.

There's always the option to discard the disk image, and start over with a fresh install - but I did customize the disk image enough to make it rather painful.

I decided to attempt to boot the VM from an old image of the Debian PowerPC installation CD that was lying around on my hard disk, and see if it gets me anywhere:
qemu-system-ppc debian_lenny_powerpc_small.qcow -cdrom debian-501-powerpcinst.iso -boot d
I tried it twice before I figured what I had to do in order to mount the disk image and attept to repair it:
  1. hit ENTER at the boot prompt to start the Debian installer
  2. select the defaults, continue until you reach the hardware detection stage, and wait
  3. when prompted to configure the host name, go back until you get a menu with an option to detect disks, select it and wait
  4. go back until you get the menu with an option to open a shell - select it
  5. type the following at the shell prompt:
    mount /dev/hda2 /media
    cd media
The /dev/hda2 partition happens to be the Linux boot partition on this VM. I inspected its contents and found that while there was only a single Kernel image there, there were two initrd images - both with the same name, but one of them with a .bak file extension.

This was, apparently, a backup copy of the old initrd image, made during the previous upgrade. I had a hunch that the old initrd image might still work with the new Kernel, because it had the same version number. I hoped that my problem was with the initrd image, rather than with the new Kernel itself.

I swapped the images
mv initrd.img-2.6.26-2-powerpc.bak initrd.img-2.6.26-2-powerpc.good
mv initrd.img-2.6.26-2-powerpc initrd.img-2.6.26-2-powerpc.bak
mv initrd.img-2.6.26-2-powerpc.good initrd.img-2.6.26-2-powerpc
cd /
sync
umount /media
shutdown the VM and started it again without the install CD.

It actually worked: I managed to login!

Wow.

The next thing to do was to downgrade the Kernel:
  1. launch aptitude
  2. select to install the older Kernel version (luckily, it was still available in the packages list)
  3. install it
  4. select the newest Kernel version (the one that caused all this grief)
  5. forbid it by pressing SHIFT-f, so that next time I don't upgrade to this specific version by mistake
I rebooted the VM and found that I could still login.

I guess I should've investigated this further and submitted a proper bug report, but running QEMU on my slow box is such a bucket of pain, that I'd rather avoid any more of it.

Friday, December 11, 2009

Anonymous Browser Uploads to Amazon S3

I've joined the Cloud.

I've signed up for the Amazon Simple Storage Service (aka S3). It costs nothing when unused, and almost nothing when used.

My original motivation for signing up was the potential for off-site backup. You know, just in case. The worst case.

But cheap remote storage isn't enough - what I hadn't considered at all, when I signed up, was bandwidth. The upload bandwidth that my ISP provides me, for the price I'm willing to pay, is a measly 512 Mbit/s. Consider uploading a 35GB snapshot via this narrow straw of a connection. I'll let you do the math. Bottom line is that it seems I won't be using S3 for backup.

But now that I've already signed up for the service, I started looking for other ways of using it: file sharing (of the legal kind) of files that are too large to send/receive as e-mail attachments.

After some digging I found S3Fox Organizer, which provides easy access to S3 from within Firefox. It allowed me to create buckets and folders, and then upload files, and then generate time-limited URLs that I could distribute to friends and family members, in order to allow them to download these files.

It works, but it's rather cumbersome when compared to Picasa, YouTube, SkyDrive, etc. And, while cheap, it ain't free.

And it's unidirectional - I could only send files.

Receiving files to my S3 account seemed to require web development karma that I don't posses. Luckily, after some more digging, I found a relevant article at the AWS Developer Community website: Browser Uploads to S3 using HTML POST Forms. The accompanying thread of reader comments is even more useful than the article itself, since it provided a ready made PHP script for generating a working, albeit rather spartan, browser upload interface:
  1. prerequisites:
    1. AWS S3 account
    2. create a storage bucket and an upload folder under that bucket (I did this with S3Fox)
    3. PHP enabled web server (see this howto for example) that will host the upload script (I host my server at my home computer)
  2. download getMIMEtype.js and place it at the document root directory
  3. place the following PHP script at the document root directory as s3upload.php
  4. edit the script and plug in your own AWS access key, AWS secret key, upload bucket name, upload folder name, and maximum file size (currently set at 50MB)
  5. share a link to this script with anyone you want to get files from

And here's the script itself:

<?PHP
// Send a file to the Amazon S3 service with PHP
//
// Taken, except for some fixes, from
// http://developer.amazonwebservices.com/connect/message.jspa?messageID=111726#111726
// which refers to the article at
// http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434
//
// Puts up a page which allows the user to select a file and sendit directly to S3,
// and calls this same page with the results when completed


// Change the following to correspond to your system:
$AWS_ACCESS_KEY = 'XXXXXXXXXXXXXXXXXXXX';
$AWS_SECRET_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
$S3_BUCKET = 'my-upload-bucket';
$S3_FOLDER = 'uploads/'; // folder within bucket
$MAX_FILE_SIZE = 50 * 1048576; // MB size limit
$SUCCESS_REDIRECT = ' http://' . $_SERVER['SERVER_NAME'] . ($_SERVER['SERVER_PORT']=='' ? '' : ':') .
$_SERVER['SERVER_PORT'] . '/' . 's3upload.php'/*$_SERVER['SERVER_SELF']*/ .
'?ok' ; // s3upload.php is URL from server root

// create document header
echo '
<html>
  <head>
    <title>S3 POST Form</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <script type="text/javascript" src="./getMIMEtype.js"></script>
    <script type="text/javascript">
    function setType(){
        document.getElementById("Content-Type").value = getMIMEtype(document.getElementById("file").value);
    }
    </script>
  </head>

  <body>
';

// process result from transfer, if query string present
$query = $_SERVER['QUERY_STRING'];
$res = explode('&', $query);
foreach($res as $ss) {
//echo 'ss: ' . $ss . '<BR/>';
if(substr($ss,0,7) == 'bucket=') $qBucket = urldecode(substr($ss,7));
if(substr($ss,0,4) == 'key=') $qKey = urldecode(substr($ss,4));
}
if($qBucket != '') {
// show transfer results
echo 'File transferred successfully!<BR/><BR/>';
$expires = time() + 1*24*60*60/*$expires*/;
$resource = $qBucket."/".urlencode($qKey);
$stringToSign = "GET\n\n\n$expires\n/$resource";
//echo "stringToSign: $stringToSign<BR/><BR/>";
$signature = urlencode(base64_encode(hash_hmac("sha1", $stringToSign, $AWS_SECRET_KEY, TRUE/*raw_output*/)));
//echo "signature: $signature<BR/><BR/>";
$queryStringPrivate = "<a href='http://s3.amazonaws.com/$resource?AWSAccessKeyId=$AWS_ACCESS_KEY&Expires=$expires&Signature=$signature'>$qBucket/$qKey</a>";
$queryStringPublic = "<a href='http://s3.amazonaws.com/$qBucket/$qKey'>http://s3.amazonaws.com/$qBucket/$qKey</a>";

echo "URL (private read): $queryStringPrivate<BR/><BR/>";
echo "URL (public read) : $queryStringPublic<BR/><BR/>";
}

// setup transfer form
$expTime = time() + (1 * 60 * 60); // now plus one hour (1 hour; 60 mins; 60secs)
$expTimeStr = gmdate('Y-m-d\TH:i:s\Z', $expTime);
//echo 'expTimeStr: '. $expTimeStr ."<BR/>";
//echo 'SUCCESS_REDIRECT: '. $SUCCESS_REDIRECT ."<BR/>";

// create policy document
$policyDoc = '
{"expiration": "' . $expTimeStr . '",
  "conditions": [
    {"bucket": "' . $S3_BUCKET . '"},
    ["starts-with", "$key", "' . $S3_FOLDER . '"],
    {"acl": "private"},
    {"success_action_redirect": "' . $SUCCESS_REDIRECT . '"},
    ["starts-with", "$Content-Type", ""],
    ["starts-with", "$Content-Disposition", ""],
    ["content-length-range", 0, ' . $MAX_FILE_SIZE . ']
  ]
}
';

//echo "policyDoc: " . $policyDoc . '<BR/>';
// remove CRLFs from policy document
$policyDoc = implode(explode('\r', $policyDoc));
$policyDoc = implode(explode('\n', $policyDoc));
$policyDoc64 = base64_encode($policyDoc); // encode to base 64
// create policy document signature
$sigPolicyDoc = base64_encode(hash_hmac("sha1", $policyDoc64, $AWS_SECRET_KEY, TRUE/*raw_output*/));

// create file transfer form
echo '
<form action=" https://' . $S3_BUCKET . '.s3.amazonaws.com/" method="post" enctype="multipart/form-data">
                           <input type="hidden" name="key" value="' . $S3_FOLDER . '${filename}">
                           <input type="hidden" name="AWSAccessKeyId" value="' . $AWS_ACCESS_KEY . '">
                           <input type="hidden" name="acl" value="private">
                           <input type="hidden" name="success_action_redirect" value="' . $SUCCESS_REDIRECT . '">
                           <input type="hidden" name="policy" value="' . $policyDoc64 . '">
                           <input type="hidden" name="signature" value="' . $sigPolicyDoc . '">
                           <input type="hidden" name="Content-Disposition" value="attachment; filename=${filename}">
                           <input type="hidden" name="Content-Type" id="Content-Type" value="">


                     File to upload to S3:
                           <input name="file" id="file" type="file">
                           <br/><br/>
                           <input type="submit" value="Upload File to S3" onClick="setType()">
                         </form>
                     ';

// create document footer
echo '
                       </body>
                     </html>
                     ';
?>

Friday, December 4, 2009

Auto Restart of Daemons after Upgrades

I use checkrestart to check which processes need to be restarted after an upgrade. This handy script is part of the debian-goodies package.

checkrestart does not only list processes that need to be restarted, but it also attempts to deduce which service/daemon the process belongs to and lists the associated init scripts that have to be restarted. It's not perfect, but most of the time it gets the right results.

For a long while I followed a manual routine:
  1. upgrade packages
  2. manually run checkrestart (as root)
  3. restart the suggested init scripts, e.g.
    /etc/init.d/shorewall restart
  4. repeat from step 2 until checkrestart reports
    Found 0 processes using old versions of upgraded files

Tips:
  1. restarting GDM means that you lose your current X session - pay attention!
  2. but, restarting your current X session is probably the cleanest way to remove desktop related processes running old files (e.g. tray icons)
  3. if sshd needs to be restarted, you'll also have to disconnect all current ssh sessions
  4. with some processes (e.g. perl and python) it's necessary to look at the command line in order to figure out what needs to be restarted:
    ps -p <process-id> -o pid= -o cmd=
  5. some processes (e.g. console-kit-daemon) require restarting dbus (see Debian bug #527846 - and it seems that it's advisable to also restart GDM afterwards)
  6. some upgrades (kernel, GRUB, libc) probably require a reboot
  7. you may still end up having to manually close some applications (e.g. mutt, emacs) or kill some stubborn processes, depending on the specific packages that were upgraded

I've automated some of this with the following script:
#! /bin/bash
/usr/sbin/checkrestart | grep -e "/etc/init\.d/[^\ ]* restart" | grep -v "/etc/init.d/gdm restart" | 
while read cmd; do 
echo "${cmd}"
eval ${cmd} 
done
/usr/sbin/checkrestart |
/usr/bin/awk \
'
{
    if ( NF == 2 && $1 ~ /^[0-9]+/ )
        system("ps -p "$1" -o pid= -o cmd=");
    else
        print $0
}
'
which runs automatically after installing/upgrading packages. This is accomplished by adding the following line to /etc/apt/apt.conf.d/99local (create this file if necessary):
DPkg::Post-Invoke { "if [[ -x /usr/sbin/checkrestart && -x /path/to/checkrestart/script ]]; then /path/to/checkrestart/script; fi;" };
Note that I specifically avoid restarting GDM automatically (you may want to add a similar check for your own favorite display manager).