Friday, April 29, 2011

Backup a Blogger Blog - Revisited

I recently found out that the method I used to backup this blog automatically, has stopped working. It all hinged on the observation that one can retrieve a single web page with the full text of all the posts of a given Blogger blog, by retrieving the link
http://blogname.blogspot.com/search?max-results=N
with a large enough N, e.g. 10000. This is not true anymore - at the moment I can only retrieve the latest 42 posts on this blog, from a total of 178.

I'm not sure when this method stopped working, or, for that matter, if it ever really worked. The fact remains, however, that I have a blog that I want to backup, so I spent the better part of an evening figuring out how to properly do this.

My new Blogger blog backup script, shown below, makes use of the Google Data services API to export and download the blog archive in XML format, and then extracts from it the links of all the posts, and mirrors these pages locally, with HTTrack:


#! /bin/bash

BLOGGER_EMAIL=user@gmail.com
BLOGGER_PASSWD=password
BLOGGER_BLOGID=000000000000000000
BLOGGER_BLOG=blogname

DEST_DIR=/path/to/backup/directory/
mkdir -p ${DEST_DIR}
cd ${DEST_DIR}

eval $( \
    curl -s "https://www.google.com/accounts/ClientLogin" \
    --data-urlencode Email=$BLOGGER_EMAIL --data-urlencode Passwd=$BLOGGER_PASSWD \
    -d accountType=GOOGLE \
    -d source=MachineCycle-cURL-BlogBackup \
    -d service=blogger | grep 'Auth='
)

curl -s "http://www.blogger.com/feeds/$BLOGGER_BLOGID/archive" \
    --header "Authorization: GoogleLogin auth=$Auth" \
    --header "GData-Version: 2" \
    | xml_pp > ${BLOGGER_BLOG}.blogspot.com.archive.xml

grep -o -e '<link href="http://'$BLOGGER_BLOG'.blogspot.com/..../[^\.]*.html" rel="alternate" title=' \
    ${BLOGGER_BLOG}.blogspot.com.archive.xml | \
    sed -e 's@.link href="@@g' -e 's@" rel="alternate" title=@@g' | \
    sort -ur > ${BLOGGER_BLOG}.links

mkdir -p ${BLOGGER_BLOG}
cd ${BLOGGER_BLOG}
httrack \
    -%v0 \
    -%e0 \
    -X0 \
    --verbose \
    --update \
    -%L ../${BLOGGER_BLOG}.links \
    -"http://${BLOGGER_BLOG}.blogspot.com/" \
    -"${BLOGGER_BLOG}.blogspot.com/*widgetType=BlogArchive*" \
    -"${BLOGGER_BLOG}.blogspot.com/search*" \
    -"${BLOGGER_BLOG}.blogspot.com/*_archive.html*" \
    -"${BLOGGER_BLOG}.blogspot.com/feeds/*" \
    -"${BLOGGER_BLOG}.blogspot.com/*.html?showComment=*" \
    +"*.gif" \
    +"*.jpg" \
    +"*.png"
A few comments are in order:
  1. the script contains the Blogger username and password - keep it safe!
  2. the blog id is the number that appears in the URL of most links accessible from the Blogger dashboard, after the blogID= part
  3. the XML blog archive may later be used to restore/migrate the blog
  4. local mirroring isn't really necessary - I just like it that I can view the blog contents offline
  5. another unnecessary step: I use xml_pp to beautify the exported XML file
  6. currently, the script performs no error checking - I may add some checks if and when I observe failures
  7. sources: Using cURL to interact with Google Data services, Blogger export format

Friday, April 15, 2011

Fixing Normalize Audio Feature in K3b

I usually burn and copy optical media with K3b. I don't do this often, but when I do, it usually just works. Except when it doesn't.

My wife asked me to create an audio CD for her, from an assortment of audio tracks she collected from various sources. It was easy enough to accomplish this with K3b. But the resulting audio CD was annoying to listen to, because I had to change the volume setting for each track.

It's a classic noob's mistake: I should've enabled audio normalization.

So I tried it again, but found that I couldn't enable audio normalization in K3b. Turns out that K3b uses an external application (called - surprise! - normalize-audio) to perform this task, and K3b just couldn't find it - a fact that was clearly (?) stated in the programs section of the K3b settings dialog.

I tried launching normalize-audio at the command line, and it seemed to be installed alright. A quick Net search brought me to Debian bug #597155 and Ubuntu bug #45026. The root cause of the problem is that normalize-audio reports its version number as
normalize 0.7.7
while K3b expects
normalize-audio 0.7.7

This can be fixed either in K3b or in normalize-audio, and patches for both sides have already been posted. But neither has been incorporated yet. In the meantime, I've implemented a workaround, based on suggestions in those bug reports:
  1. create (as root) an executable script named normalize-audio under /usr/local/bin/ with the following contents:
    #!/bin/bash
    case "$1" in    
        --version)
            /usr/bin/normalize-audio --version | sed -e 's/normalize/normalize-audio/g'
            ;;
        *)
            /usr/bin/normalize-audio "$@"
            ;;
    esac
  2. make this script executable:
    chmod a+x /usr/local/bin/normalize-audio
  3. this script is supposed to be used as a wrapper for normalize-audio, so make sure that /usr/local/bin/ appears in the PATH environment variable, and that it comes before /usr/bin
  4. launch K3b from a new command shell - it should now detect normalize-audio and allow you to use it

Time to Burn!
[05 Nov 2011] UPDATE: fixed script to work with file names/paths that contain spaces (Thanks anonymous commenter!)