Saturday, June 28, 2008

Selecting the Emacs Spell Checker Default Dictionary

While writing the upcoming blog post I noticed that Emacs tells me its using the British dictionary while spell checking my text (via M-x ispell-buffer, or in flyspell-mode). I have no idea how it got to be like that, but I'm more comfortable with American spelling.

In any case, after verifying that I had the iamerican package installed (the American dictionary for ispell) I ran
dpkg-reconfigure dictionaries-common
selected the American English dictionary, and then restarted Emacs. Dandy.

Tuesday, June 17, 2008

UnDBX: Extract E-Mail Messages from Outlook Express DBX files

If you can't beat them, join them.

Here's my tiny, hopefully half decent, contribution to the FOSS universe: UnDBX, a command-line utility that I've developed to extract e-mail messages from Outlook Express DBX files.

There are many such utilities around, so why write another one? because I had to.

As I described on this blog some time ago, I used to backup my wife's mailboxes with a combination of DbxConv and rsync, launched from a VB script run by the Bacula file daemon (phew!). This allowed me to backup a few megabytes of data a day (i.e. just the new messages), instead of several gigabytes (i.e. a bunch of very large monolithic DBX files). The objective was to save precious disk space on my backup device (an external USB hard disk). The price was a complicated backup scheme, wasted disk space on my wife's PC, and long backup jobs (more than 3 hours every night!).

This backup scheme failed mysteriously several times. Debugging it is a real pain, simply because it takes so much time to complete a backup job. I finally decided, almost three months ago to stop using it, and directly backup the gigantic DBX files, until I can come up a with a better solution.

My original intent was to add an incremental extraction option to DbxConv, so that it would only extract to disk e-mail messages that haven't been extracted yet. That would make the extraction process much shorter, and also save disk space because a scratch folder is not needed anymore. As I browsed through the DbxConv source code I realized that I can't modify it, because it uses MFC, and MFC is not available in MinGW, which is the toolchain I have available in Debian.

The solution? UnDBX - the DBX extraction tool.

I ported the DbxConv DBX parsing code from C++ with MFC to plain C, and wrote a main function that extracts messages from all the DBX files in a specified folder, to a sub-folder of a given output folder. The first round works very much like DbxConv - all messages are extracted to disk as EML files. Subsequent runs only extract new messages to disk, and also delete EML files on the disk that do not correspond to messages in the DBX files (i.e. deleted messages).

Unlike DbxConv, UnDBX cannot convert DBX files to MBOX files - its sole purpose is to facilitate fast incremental backup of DBX file.

Backup jobs are down to 8 minutes! that's with 14 DBX files, over 35000 messages, and 3.5GB of data - a nightmare. I hope some of you will find it useful too. Enjoy.

Saturday, June 14, 2008

Pipe Dreams (or: VBScript, Spawned Processes and StdOut/StdErr Capture)

I mentioned before that I'm writing a small console utility for Window$ that reads and writes a lot of files. It works nicely on my Debian machine, both when compiled natively and when cross compiled for Window$ and run with Wine. It even works when run in a console window on my wife's PC.

So far so good, but I intend to spawn the program from within a VB script, run by the Windows Script Host. So I wrote a little script that (I thought) does exactly that. Here's a script that runs a command and captures its output:

' do.vbs - run a command and echo its output
' usage:
' cscript do.vbs "command arguments ..."
Set WshShell = CreateObject("WScript.Shell")
If Wscript.Arguments.Count = 1 Then
runCommand Wscript.Arguments.Item(0)
Else
Wscript.Echo "Please supply command to run, enclosed in double quotes."
End If
Set WshShell = Nothing

Sub runCommand(strCommand)
Set objScriptExec = WshShell.Exec(strCommand)
strStdOut = objScriptExec.StdOut.ReadAll
WScript.Echo strStdOut
Set objScriptExec = Nothing
End Sub

This works nicely with commands like "dir C:" or "ipconfig /all", or any other program that only outputs text to the standard output stream (StdOut). Trouble starts when the program in question also outputs text to the standard error stream (StdErr) - a common practice among console utilities, mine included.

Such programs simply hang.

How lame.

Yes, even if you try to capture StdErr with StdErr.ReadAll.

Well, it seems that only one stream can be captured like this. It's some kind of a race condition, since you can get it to work for some programs (as in this Micro$oft knowledge base article). But in general it's hopeless.

Here's the best workaround I could come up with for this (tested on WinXP Home edition, YMMV):

Sub runCommand(strCommand)
Set objScriptExec = WshShell.Exec("cmd /c " & strCommand & " 2>NUL")
strStdOut = objScriptExec.StdOut.ReadAll
WScript.Echo strStdOut
Set objScriptExec = Nothing
End Sub

This completely discards the contents of StdErr. Alternatively, you may want to replace NUL with a path to a file, so that StdErr will be redirected to that file.

So very lame.

[29 Oct 2008] UPDATE: a kind anonymous soul posted a comment, providing a better workaround:

Sub runCommand(strCommand)
Set objScriptExec = WshShell.Exec("cmd /c " & strCommand & " 2>&1")
strStdOut = objScriptExec.StdOut.ReadAll
WScript.Echo strStdOut
Set objScriptExec = Nothing
End Sub

which not only prevents the script from hanging, but also allows it to collect messages from both StdOut and StdErr. Thanks!

Wednesday, June 11, 2008

Iceweasel, Plugins and Add-ons! Oh My!

I clicked on a link to a PDF file and nothing happened. I'm used to it - the combination of my slow machine, Acrobat Reader and the World-Wide-Wait is enough to grind my wide-band Internet connection to a halt. Only that this time nothing happened.

Well, actually how could I be sure? the halting problem is one of the more practically significant theorems that I'm aware of. Anyway, I clicked several times, with no response, I middle-clicked, to open the link in a different tab, and still nothing happened. It worked a few days ago. What gives?

Maybe it's a problem with the Acrobat Reader Iceweasel plugin? I opened about:plugins but the plugin seemed to be installed correctly. I reinstalled it anyway:
aptitude reinstall mozilla-acroread
No go. No surprise.

What next? - I usually launch Iceweasel with a shortcut key, so I tried launching it from a terminal window, in the hope that some diagnostic message would show up there. Nah. Get real. Why should I be so lucky?

I clicked again on the link and I suddenly noticed that the downloads statusbar appeared - I was wrong, something does happen when I click on a link to a PDF file: it gets downloaded automatically to some directory. This was weird on two accounts: first off it was downloaded instead of opened by the Acrobat Reader plugin; the other problem was that it was downloaded automatically to some directory even though I've set an option in the download preferences to make Iceweasel always ask me for a target directory.

Maybe it's a problem with the downloads statusbar add-on? I disabled it:
  1. select menu Tools -> Add-ons
  2. select the Extensions tab
  3. find the add-on to disable and press the Disable button
  4. quit and run Iceweasel again
it still did not work.

So maybe it's a problem with file types? let's check:
  1. select menu item Edit -> Preferences
  2. select the Content tab
  3. click the button "Manage..." in the File Types section
  4. verify that an action is registered for the PDF file type
It looked OK.

I tried opening the same link in a different tab using the context menu (right-click) and, surprisingly enough, it worked. I tried opening the link from the history panel (Ctrl-H), and again, it worked!

So it wasn't a file type problem after all. But what was it?

I decided to check where the PDF was downloaded to, and was surprised to find it in ~/iMacros/Downloads. Aha!

I installed the iMacros add-on several days ago because I use keyboard macros in emacs a lot and browser automation via macros sounded like a good idea at the time. I tried it once, was impressed by its potential, but realized that I didn't really need browser automation. I'm fickle minded. Sue me. I decided to leave it installed just in case, and then forgot all about it. It just so happens that this version seems to be buggy.

I disabled iMacros, restarted Iceweasel, and mouse clicks on PDF links started working again.

Joy.

Sunday, June 8, 2008

X.Org 7.3: The Good, The Bad and The Ugly (3)

It was Good, it was Bad, and finally it went horribly Ugly. The so called "User Experience", that is.

The story so far: after upgrading X.Org to version 7.3, my laptop would completely lockup at startup, upon switching from console display to graphical display. After some futzing around I isolated the problem to the ATI display driver.

What now? my options seemed clear:
  1. downgrade the driver (and, due to dependencies, all of X.Org) to the previous, working, version, file a bug report, and then wait for a fix...
  2. apply one of the workarounds that I found, file a bug report, and then wait for a fix...
Being me I started exploring another option: try to debug and fix it myself, file a bug report containing a patch that fixes the problem, and then wait for it to be included upstream...

I went over to the ATI driver page on the Debian PTS, and found out that the package source code repository is managed with Git. This was great news.

In brief, Git provides a tool called git bisect that (in theory) allows anyone (including non-programmers - again, in theory) to find the cause of a software bug by isolating a single bad commit (i.e. a single batch of source code modifications) that is causing it. But there's no guarantee that the problem is caused by a single commit. I decided to play the optimist (for a change) and dived in - head first.

First things first: install Git, like this
aptitude install git-core gitk
If you're running a firewall, you'd better open port 9418 for outgoing TCP connections. I use shorewall:
  1. add the following line to /etc/shorewall/rules:
    ACCEPT      $FW      net        tcp     9418
  2. restart the firewall
    invoke-rc.d shorewall restart
Next, clone the source code repository:
git clone git://git.debian.org/git/pkg-xorg/driver/xserver-xorg-video-ati
Now figure out how to build, install and test it, which, in this case, is as simple as:
cd xserver-xorg-video-ati; dpkg-buildpackage -rfakeroot -b -tc -uc
dpkg -i ../xserver-xorg-video-ati_6.8.0-1_i386.deb
... and then reboot.

This is where the fun starts. You start out by telling Git that a bisection process has started and marking the current version as bad:
cd xserver-xorg-video-ati
git bisect start
git bisect bad
We now need to mark the previous version as good:
git checkout -f xserver-xorg-video-ati-1_6.6.3-4; git clean -d -f
git bisect good
Git responds by selecting a commit halfway between the bad and good commits:
Bisecting: 426 revisions left to test after this
[2f87bff293a343b40c1be096933a5ae126632468] RADEON: Fix subtle change in crtc reg init
At this point we need to build this halfway snapshot, test it and tell Git if it works or not with git bisect good or git bisect bad, respectively.

So much for theory. I couldn't build the halfway snapshot that I got! the problem was rather odd - there was no debian sub-directory. I figured out what happened by using gitk to inspect the commit history in the repository.

It turns out that the Debian package Git repository contains both downstream unique files (i.e. the debian directory and its contents) and the upstream source code. Occasionaly, when a new version of the driver's package is being prepared, upstream commits are pulled to the downstream repository and merged. The debian directory is missing from the upstream repository (and this is as it should be), so that whenever Git bisects the downstream repository it is most likely to create a repository without this directory.

My solution to this was to have two clones of the package Git repository - one of them was used only for bisection, and the other for actual package building and testing. After each bisection step I pulled from the first repository to the second repository, which was reset beforehand to the previous working version. This way I got a repository that included both the debian directory and the commits upto the current bisection point.

It took around 13 iterations (read: around 13 reboots) before I hit the jackpot (did I mention that this is the Ugly part of my story?).

Eventually Git informed me that
80eee856938756e1222526b6c39cee8b5252b409 is first bad commit
RADEON: fix console restore on netbsd
This looked very relevant, but after inspecting the source code I was stumped: it was obvious that some hardware registers/modes were being saved/restored, but to what end? and what did this "fix" actually fix? and more importantly: what did this NetBSD related fix break on my box?

The only fix I could come up with was to revert the effect of this modification - but only under Linux. And what do you know? it solved my problem! I incorporated my fix into the current version, and it started to work fine (in case you're keeping count: two more reboots).

I reported the bug on the Debian BTS, complete with a patch (see bug #480312). My patch was eventually committed into the upstream Git repository a few days later.

A happy end?

I later spent some time browsing through more of the code, and my fix seemed to be at home: the driver's code contains quite a few code fragments that are either enabled or disabled, depending on both hardware type and target platform. It's quite obvious that the upstream author(s) of the driver need all the help they can get - the task they took upon themselves isn't easy.

I have a strong suspicion that it will break again - I just hope that I'll upgrade my hardware by then...