Friday, February 24, 2012

Forcing SATA Speed Limit to 1.5Gbps

Recap: I've upgraded the hard disk on my laptop, installed Win7 Ultimate on it from upgrade media, installed Debian GNU/Linux to a another partition on the same disk, and configured the Windows Boot Loader for dual boot. I was very pleased. It all seemed to work well, except for a small power management issue.

I got busy installing all the stuff I needed on both operating systems, setting up printers and other hardware, configuring mount points, network shares, backup, ssh, etc. All in all, this took a few days, during which the new disk withstood a lot of read/write operations (gigabytes at a time). Most of this work was rather boring.

And then, following one of many reboots, the Windows Boot Loader failed to start:
Windows failed to start. A recent hardware or software change might be the cause. To fix the problem:

    1. Insert your Windows installation disc and restart your computer.
    2. Choose your language settings, then click "Next."
    3. Click "Repair your computer."

If you do not have this disc, contact your system administrator or computer manufacturer for assistance.

    File: \Boot\BCD
    Status: 0xc000000e
    Info: An error occurred while attempting to read the boot configuration data.


ENTER=Continue
My first reaction was panic. My next reaction was to power cycle the laptop. It came up just fine: Windows seemed to work OK, and another reboot confirmed that the Linux partition was alive too.

I wasn't pleased anymore.

I searched for that error code, and found a lot of complaints, and a lot of "solutions" - but in most cases the error was permanent, and a reboot did not make it go away. This was a bad sign, but I had very little to work on, so I let it go.

Two days later Debian suddenly hard-locked on me. After power-cycling the box I was not able to find any error message in any of the log files.

And a few days after that, Debian failed to boot, dropping me into a limited shell, claiming that the boot device was missing.

After yet another Windows Boot Loader failure, I used grub-install to replace it with GRUB2, in the slim hope that it would fix the problem. Even if this wouldn't fix the problem, at least I'd be able to inspect the system when it fails, from within the GRUB2 shell.

GRUB2 did not fix anything: the laptop would still occasionally fail to boot - either GRUB2 would fail to find its own files, or the kernel would start but later on fail to find the root device.

During all that time, when my laptop was up and running, it would occasionally freeze for brief, but noticeable, periods of time. It took me a while to realize that this was not a case of a slow-to-load website, or icedove just taking its sweet time starting up - these hiccups were correlated with messages like the following being logged to /var/log/kern.log:
ata3: ATA_REG 0x40 ERR_REG 0x0
ata3: tag : dhfis dmafis sdbfis sactive 
ata3: tag 0x0: 1 1 0 1  
ata3.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x6 frozen
ata3.00: failed command: WRITE FPDMA QUEUED
ata3.00: cmd 61/08:00:00:f8:4d/00:00:0c:00:00/40 tag 0 ncq 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: failed command: WRITE FPDMA QUEUED
ata3.00: cmd 61/08:08:18:f8:4d/00:00:0c:00:00/40 tag 1 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

...

ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: nv: skipping hardreset on occupied port
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0 
ata3.00: device reported invalid CHS sector 0 
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0
ata3.00: device reported invalid CHS sector 0
ata3: EH complete
At first glance it seemed to me that the hard disk was failing. I subjected it to a slew of tests (fsck, smartmontools, MHDD), and in all of them the disk was found to be in excellent shape.

The next suspect in line was the on-board SATA controller. The laptop's mobo is based on the nVidia's nForce4 430 (MCP51) chipset. Wikipedia has this to say about it:
There have also been data corruption issues associated with certain SATA 3 Gbit/s hard drives.
The suggestion provided by many online, is to somehow force SATA speed to 1.5Gbps. However, as far as I can tell, there's no BIOS settings on this laptop to do that, and the disk itself cannot be forced either (no jumpers).

After some more research I found that the SATA speed can be forced on the kernel command-line:
  1. modify /etc/default/grub:
    GRUB_CMDLINE_LINUX="libata.force=1.5Gbps"
  2. run update-grub as root
  3. reboot
This cannot solve early boot problems, but I was hoping that it would at least prevent the lockups and hiccups.

Unfortunately, this did not work as expected - here's what I get in the logs:
ata3: FORCE: PHY spd limit set to 1.5Gbps
ata3: SATA max UDMA/133 cmd 0x30c0 ctl 0x30b4 bmdma 0x3090 irq 23

...

ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
The kernel reports its intention to force the SATA controller to 1.5Gbps, but the link comes up at 3.0Gbps regardless. Sh*t.

I still had another trick up my sleeve, but I've run out of steam here, so you'll have to wait for my next post.

2 comments:

  1. What became of this issue? Did you ever solve it? I've got the same issues with the same chipset.

    ReplyDelete
    Replies
    1. Never really solved it. My new hard disk finally died due to excessive power cycling (I guess), so I had to put the old disk back in. I finally purchased a new PC and ditched this box altogether.

      Delete