Friday, February 12, 2010

Dynamic Bacula Filesets

Bacula provides a rather flexible method for specifying the files and directories to include in and/or exclude from backup jobs - the Fileset resource.

It is, in fact, so flexible that you can use an external script or program to generate the list of files to backup, on the fly. That program is expected to dump a list of paths to backup to standard output, which is piped to Bacula, either at the server side (director):
File="|/path/to/script <args>"
or at the client side (file daemon):
File="\\|/path/to/script <args>"

Running the fileset generation program at the client side has the advantage of running with super-user privileges.

Here's an example, adapted from the Bacula manual, for backing up all local Ext3 disk partitions
FileSet {
    Name = "All local partitions"
    Include {
        Options { signature=SHA1; onefs=yes; }
        File = "\\|bash -c \"df -klF ext3 | tail -n +2 | awk '{print \$6}'\""
Note that, if there are other include/exclude criterions in the Fileset, the file daemon still has to determine which files and directories it has to backup, under each parent directory that is specified by the external program.

A similar method can be used to completely workaround Bacula's file selection logic. One reason to do this would be to select files according to criteria that cannot be expressed using the normal Fileset resource definition syntax (e.g. file selection by date).

I became interested in this after I learned that if I specify /grandparent/parent/child/file as a backup target, Bacula does not backup the permissions and ownership info of any of the parent directories. This happens because none of the parent directories is a backup target itself, or a sub-directory of a directory which is a backup target.

This isn't a bug, but rather just the way Bacula is designed. It actually makes sense when you think about it. But the end result is that if you cherry pick directories to backup (like I do), you may end up with some non-obvious permissions/ownership problems upon a full restore, due to the fact that some parent directories were not specified as backup targets.

Turns out it's so tricky and cumbersome to get the behavior I want using the usual Fileset definition constructs, that using an external script for selecting files is actually the easy solution to my problem.

There are, however, a few gotchas that I had to address before I could deploy this scheme.

Say we have a program (more specifically: a Python script called that, when run, dumps a list of files and directories to standard output, which we wish to backup. The Fileset resource we would use in this case looks like this (for a Linux client):
FileSet {
    Name = linux-client-fileset
    Include {
        Options {
            signature = SHA1
            compression = GZIP
            wild = "*"
            Exclude = yes
        File = "\\|/usr/bin/env LANG=en_US.UTF-8 /usr/bin/python /path/to/ <args>"
If you ponder this for a bit you'll note that this definition excludes any file/directory that does not appear in the list dumped by our script - which is exactly what we want. The tricky part here, is that the list has to be reverse sorted, such that any sub-directory path appears in it before its parent directory, otherwise Bacula will filter it out.

Another issue, which I was not aware of initially, is that the locale information isn't propagated to the sub-process running the script. The tricky bit here is that locale information is propagated after manually restarting the file daemon process - the restarted process seems to inherit the environment settings of the shell that was used to restart it. A simple solution is to explicitly specify the value of the LANG environment variable, as I've done above.

The next issue I had to tackle was that when the file list is generated by a script, it's apparently generated before the client executes any of the ClientRunBeforeJob scripts that are configured in the backup job definition. This means that, if you create new files as part of the operation of the pre-backup scripts, these files will not be included in the current backup job. This is different than the normal state of affairs.

I had to split the backup jobs for each of the client machine that I backup into two jobs: one job runs the ClientRunBeforeJob scripts, but uses an empty fileset (i.e. one that doesn't have any File directive), and a second job that runs afterwards and uses the dynamic fileset.

The last problem, but not the least, was getting this scheme to work both on Window$ and Linux, with filenames that happen to contain illegal characters, using the same Python script. This was an interesting exercise in its own right, but I'll leave that to a future post.

OK, so it's complicated, and the benefits are dubious, but you've read so far. You're too kind. Thanks.


  1. I would add another issue to your list: what if the script fails execution in client side? "File=" would be a null string and the Job would end with success status.
    Is there any way to make the job fail if the script fail?

    1. Either I never hit this situation, or that when my script failed (i.e. exited with a non-zero return code) the job also failed, but I may be mistaken - it's been quite a while...

      In any case, even if bacula treats a null list as valid, you can always have your script save the number of files (or any other relevant info) to a file, and use a ClientRunBeforeJob script to read this file in and fail the job if needed.

      Yet another kludge...

  2. Minor correction:

    File = "\\|bash -c \"df -klF ext3 | tail -n +2 | awk '{print \$6}'\""

    should be (at least on CentOS 5/6)

    File = "\\|bash -c \"df -lPt ext3 | tail -n +2 | awk '{print \$6}'\""

    t = type, P = Posix, critical because the default LVM names are too long:

    [root@dbnlw01p ~]# df -t ext3
    Filesystem 1K-blocks Used Available Use% Mounted on
    29741864 6628860 21577824 24% /

    Either way: thanks for this!

  3. Bad link to