The story so far: after upgrading X.Org to version 7.3, my laptop would completely lockup at startup, upon switching from console display to graphical display. After some futzing around I isolated the problem to the ATI display driver.
What now? my options seemed clear:
- downgrade the driver (and, due to dependencies, all of X.Org) to the previous, working, version, file a bug report, and then wait for a fix...
- apply one of the workarounds that I found, file a bug report, and then wait for a fix...
I went over to the ATI driver page on the Debian PTS, and found out that the package source code repository is managed with Git. This was great news.
In brief, Git provides a tool called git bisect that (in theory) allows anyone (including non-programmers - again, in theory) to find the cause of a software bug by isolating a single bad commit (i.e. a single batch of source code modifications) that is causing it. But there's no guarantee that the problem is caused by a single commit. I decided to play the optimist (for a change) and dived in - head first.
First things first: install Git, like this
aptitude install git-core gitkIf you're running a firewall, you'd better open port 9418 for outgoing TCP connections. I use shorewall:
- add the following line to /etc/shorewall/rules:
ACCEPT $FW net tcp 9418
- restart the firewall
invoke-rc.d shorewall restart
git clone git://git.debian.org/git/pkg-xorg/driver/xserver-xorg-video-atiNow figure out how to build, install and test it, which, in this case, is as simple as:
cd xserver-xorg-video-ati; dpkg-buildpackage -rfakeroot -b -tc -uc... and then reboot.
dpkg -i ../xserver-xorg-video-ati_6.8.0-1_i386.deb
This is where the fun starts. You start out by telling Git that a bisection process has started and marking the current version as bad:
cd xserver-xorg-video-atiWe now need to mark the previous version as good:
git bisect start
git bisect bad
git checkout -f xserver-xorg-video-ati-1_6.6.3-4; git clean -d -fGit responds by selecting a commit halfway between the bad and good commits:
git bisect good
Bisecting: 426 revisions left to test after thisAt this point we need to build this halfway snapshot, test it and tell Git if it works or not with git bisect good or git bisect bad, respectively.
[2f87bff293a343b40c1be096933a5ae126632468] RADEON: Fix subtle change in crtc reg init
So much for theory. I couldn't build the halfway snapshot that I got! the problem was rather odd - there was no debian sub-directory. I figured out what happened by using gitk to inspect the commit history in the repository.
It turns out that the Debian package Git repository contains both downstream unique files (i.e. the debian directory and its contents) and the upstream source code. Occasionaly, when a new version of the driver's package is being prepared, upstream commits are pulled to the downstream repository and merged. The debian directory is missing from the upstream repository (and this is as it should be), so that whenever Git bisects the downstream repository it is most likely to create a repository without this directory.
My solution to this was to have two clones of the package Git repository - one of them was used only for bisection, and the other for actual package building and testing. After each bisection step I pulled from the first repository to the second repository, which was reset beforehand to the previous working version. This way I got a repository that included both the debian directory and the commits upto the current bisection point.
It took around 13 iterations (read: around 13 reboots) before I hit the jackpot (did I mention that this is the Ugly part of my story?).
Eventually Git informed me that
80eee856938756e1222526b6c39cee8b5252b409 is first bad commitThis looked very relevant, but after inspecting the source code I was stumped: it was obvious that some hardware registers/modes were being saved/restored, but to what end? and what did this "fix" actually fix? and more importantly: what did this NetBSD related fix break on my box?
RADEON: fix console restore on netbsd
The only fix I could come up with was to revert the effect of this modification - but only under Linux. And what do you know? it solved my problem! I incorporated my fix into the current version, and it started to work fine (in case you're keeping count: two more reboots).
I reported the bug on the Debian BTS, complete with a patch (see bug #480312). My patch was eventually committed into the upstream Git repository a few days later.
A happy end?
I later spent some time browsing through more of the code, and my fix seemed to be at home: the driver's code contains quite a few code fragments that are either enabled or disabled, depending on both hardware type and target platform. It's quite obvious that the upstream author(s) of the driver need all the help they can get - the task they took upon themselves isn't easy.
I have a strong suspicion that it will break again - I just hope that I'll upgrade my hardware by then...