Friday, March 5, 2010

Backup to The Cloud

I've mentioned before that I was looking into off-site backup options. I also reported that I got to the conclusion that the upload bandwidth provided by my ISP is too low to be useful for such a backup plan. My napkin calculations indicated that I'll need 7 days for uploading a full backup snapshot, and 4 to 5 hours to upload an incremental nightly backup.

Well, I'm glad to report that I now have a working off-site backup scheme, based on duplicity, where the nightly backup (to Amazon's S3) takes, typically, just a few minutes to upload. The data transfer is secured with SSL, and the data itself is compressed and encrypted with GnuPG. Furthermore, the off-site backup is an accurate mirror of the contents of my Bacula backup storage, which contains a backup of both my wife's WinXP laptop, and my own Debian box.

The initial full backup was a bit of a pain to setup, though. I first did a full restore from Bacula's backup to an external disk, and used duplicity to generate the initial full backup set on my other external disk. This took a few hours to complete, and then I uploaded the full backup set to S3 using s3cmd's sync command. It took 8 days to complete the upload, with several interruptions, but mostly keeping a pretty constant upload rate of about 60KB/s, which is close to the nominal 64KB/s that it's supposed to be.

Once the initial upload was done, I added a new job to my Bacula configuration that triggers a nightly backup script to run after the last nightly backup job. In the script (shown below) I use BaculaFS to mount Bacula's storage as a filesystem (for each client and fileset) and then I use duplicity to backup its contents. I also backup duplicity's cache directory to an external disk, just in case.

Note the use of duplicity's --archive-dir and --name command line options. These options allow the user to control the location of duplicity's cache directory, instead of the default, which is a path name based on the MD5 signature of the target URL. I needed to do this because I moved the backup set from one URL (local disk) to another (S3), and I didn't want duplicity to rebuild the cache by downloading files from S3, but rather use the cache that was already on my box.

In the few days I've been using this scheme, I haven't hit any communication failure with S3 during backup. I expect I'll have to modify the script in order to handle such a failure, but I'm not sure how - at the moment I just abort the snapshot if the point-to-point network adapter is down.

The next step is to setup a disk-on-key with the relevant GnuPG encryption keys and S3 access keys, in order to allow me to restore files from S3, in case both my box and my Bacula backup go kaput.

To be continued.


#! /bin/bash

export HOME=/root
SCRIPTDIR=$(dirname $0)
ARCHIVE=$HOME/.cache/duplicity
ARCHIVE_BACKUP=/mnt/elements/backup/duplicity
DEST=s3+http://my-bucket
SRC=/mnt/baculafs
DUPLICITY_OPTS="--encrypt-key=XXXXXXXX --sign-key=YYYYYYYY --archive-dir=$ARCHIVE"
DUPLICITY_LOGLEVEL="-v4"

# make sure we're connected to the internet
PPPIFN=$(cat /proc/net/dev | grep ppp | wc -l | awk '{ print $1 }')
if [ "$PPPIFN" == "0" ]
then
    exit 1
fi

snapshot()
{
    umount $SRC/$2 2>/dev/null

    export AWS_ACCESS_KEY_ID=$(grep access_key $HOME/.s3cfg | cut -d\  -f3)
    export AWS_SECRET_ACCESS_KEY=$(grep secret_key $HOME/.s3cfg | cut -d\  -f3)
    export PASSPHRASE=$(cat $SCRIPTDIR/duplicity-gpgpass)

    echo "Generating list of current files in backup ..."
    duplicity list-current-files -v0 $DUPLICITY_OPTS --name=$2 $DEST/$2 > /tmp/$2.list || exit $?

    echo "Mounting BaculaFS ..."
    baculafs -o client=$1-fd,fileset=$2-fileset,cleanup,prefetch_difflist=/tmp/$2.list $SRC/$2 || exit $?

    echo "Prunning remote snaphsot ..."
    duplicity remove-older-than 1W $DUPLICITY_LOGLEVEL $DUPLICITY_OPTS --name=$2 --force $DEST/$2
    
    echo "Updating remote snaphsot ..."
    duplicity $DUPLICITY_LOGLEVEL $DUPLICITY_OPTS --name=$2 $SRC/$2/ $DEST/$2

    unset PASSPHRASE
    unset AWS_SECRET_ACCESS_KEY
    unset AWS_ACCESS_KEY_ID

    fusermount -u $SRC/$2

    echo "Backup local duplicity archive ..." 
    rsync -av --delete $ARCHIVE/$2/ $ARCHIVE_BACKUP/$2/
}

snapshot winxp winxp
snapshot machine-cycle machine-cycle
snapshot machine-cycle catalog

No comments:

Post a Comment