Thursday, April 26, 2007

Backup/Extract/Convert Outlook Express Messages

[18 Jun. 2008] UPDATE: I now use UnDBX to facilitate fast incremental backups of DBX files.

My wife's email client of choice is Outlook Express. That, in itself, is OK with me. Except that she doesn't delete messages. Never. She just can't be bothered. Her OE storage folder currently takes up 1.4GB of disk space.

The real problem is backup. All the messages in each OE message folder are stored in a single monolithic dbx file. This means that, on a daily basis, Bacula, during the incremental backup process, encounters very large files that were modified and need to be backed up. A daily incremental backup of over 1GB is unacceptable, since my storage medium is a 60GB hard disk, that needs to hold the full and incremental backups of both our computers.

The solution I came up with was pretty simple: extract all the email messages from the dbx files to a different folder, so that Bacula only needs to backup new email messages, during an incremental backup process. Little did I know how difficult it would be to setup such a scheme.

It took me quite a while to find a command line tool that can extract eml files from dbx files, and is free. Searching Google for "extract eml dbx" or "convert eml dbx" bring a lot of links to shareware tools, and most of these cannot be used from a script.

I tried using tools like xdelta to build and backup binary delta files, but this proved to be problematic - all the tools I tried required too much memory, and took a lot of time to run, to the point of being impractical.

I even started toying with the idea of writing such a tool. Finally, during my search for the OE dbx file format specification, I found DbxConv - a nice little utility that does exactly what I wanted it to do.

The complete solution is a bit more complex than just running DbxConv. Before every backup job, the Bacula Director Daemon instructs the Bacula File Daemon on my wife's computer, to run a VB script (available here), as specified in /etc/bacula/bacula-dir.conf:

Job {
ClientRunBeforeJob = "c:/windows/system32/cscript.exe c:/backup/tools/run-before-job.vbs %n"

This script attempts to shutdown Outlook Express, calls DbxConv to extract eml messages from the dbx files to a scratchpad folder, and then uses cygwin's rsync utility to synchronize the content of the scratchpad folder with an eml storage folder that is marked for backup in the Bacula Director's configuration. The scratchpad folder is then erased, and the backup process continues.

This process require an extra free disk space of twice the size of the OE storage folder, but that is a small price to pay, compared to the daily savings in backup disk space.

[18 Jun. 2008] UPDATE: I now use UnDBX to facilitate fast incremental backups of DBX files.

Sunday, April 15, 2007

Upgrading from "etch" to "lenny"

[25 Feb. 2009] UPDATE: this is an old post about upgrading Debian/testing. If you're considering uprading Debian/stable, please read the official upgrade instructions.

Well, Debian "etch" is now officially Stable. Turning stable meant one thing: a major upgrade.

I like the fact that the software I use gets routinely updated. Going stable means that "etch" only gets security updates from now on. So, on to "lenny" - previously Unstable and now officially Testing.

The only thing required to perform the transformation is edit /etc/apt/sources.list and make sure that any reference to etch is replaced with testing. After that it's just a matter of pressing Ctrl-Alt-F1, logging in as root, and running

apt-get update
apt-get dist-upgrade

and then reboot.

Sounds easy enough ...

Well, not quite. But it wasn't too painful either. I encountered three issues:
  1. the djvulibre-plugin had a broken dependency - running
    apt-get -f dist-upgrade
    (as instructed by apt-get itself) resolved this issue,
  2. bacula was updated from version 1.38 to 2.0 - I intend to share my experiences regarding this issue in an upcoming post.
  3. gallery2 was updated from version 2.1 to 2.2 - I had to visit all three gallery sites that I maintain and let the upgrade wizard step me through the upgrade process.
Needless to say (but I'll do it anyway) I made sure that a valid backup was ready, just in case, before I started this.

The whole process (not counting bacula maintenance) took about an hour.
And now back to work...

[04 Nov. 2008] UPDATE: I've recently spotted the Lenny upgrade-advisor. It's a modular tool that's meant to perform some sanity checks on your system before upgrading (README). I haven't tried it myself - I'm just passing the word...