Friday, November 6, 2009

On Being S.M.A.R.T

After reading "Watching a hard drive die", "Checking Hard Disk Sanity With Smartmontools" and the Wikipedia article about S.M.A.R.T, I decided to install smartmontools and test my hard drives for problems:
aptitude install smartmontools

The first hard drive that I diagnosed with smartctl was my laptop's primary hard disk:
smartctl -a /dev/hda
This showed a few unsettling results, but luckily no reallocation errors or other critical errors.

I got similar results for my external Western Digital Elements hard disk. That was good, because that's my backup disk. Phew.

My other external hard disk is an old Western Digital hard disk that's in a USB connected disk enclosure. I tried diagnosing it and got the following error:
root@machine-cycle:~# smartctl -a /dev/sda
smartctl 5.39 2009-10-10 r2955 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-9 by Bruce Allen,

/dev/sda: Unsupported USB bridge [0x04b4:0x6830 (0x001)]
Smartctl: please specify device type with the -d option.

Use smartctl -h to get a usage summary
I did as I was told (i.e. read the usage summary) and then tried the following:
smartctl -a -d usbcypress /dev/sda
Cypress happens to be the manufacturer of this enclosure's USB to IDE bridge, but smartctl doesn't seem to recognize it without my help.

Well, now I got a report from smartctl but it showed that one DMA CRC error was logged.

I ran tests on all hard disks with smartctl -t short ... for each device and they were all completed successfully. Phew. /Me wiping cold sweat off brow/

Next thing to do was to enable smartd to monitor all my hard disks:
  1. edit /etc/default/smartmontools and make sure you have the following line in it:
  2. start the daemon:
    invoke-rc.d smartmontools start
On my system this doesn't work as is, and I had to edit the daemon configuration file /etc/smartd.conf, based on the examples in the comments and the manual page:
  1. comment out the line that starts with DEVICESCAN (i.e. prepend it with a sharp sign #)
  2. add lines per each hard disk to be tested:
    # primary disk                                                                                                                               
    /dev/hda -a -o on -S on -s (S/../.././05|L/../../7/04) -m root -M exec /usr/share/smartmontools/smartd-runner
    # /dev/gigapod (multimedia)
    /dev/disk/by-path/pci-0000:02:00.2-usb-0:1.4:1.0-scsi-0:0:0:0 -a -d usbcypress -d removable -o on -S on -s (S/../.././05|L/../../7/04) -m root -M exec /usr/share/smartmontools/smartd-runner
    # /dev/elements (backup)
    /dev/disk/by-path/pci-0000:02:00.2-usb-0:1.1:1.0-scsi-0:0:0:0 -a -d sat -d removable -o on -S on -s (S/../.././05|L/../../7/04) -m root -M exec /usr/share/smartmontools/smartd-runner
    Note that this schedules short self tests to run each morning at 5AM and long self tests to run on Sunday mornings at 4AM.

    Also note that I use the /dev/disk/by-path links to the external disk block device, in order not to be hit by udev's tendency to reorder device names.

Testing, 1, 2, 3 !

No comments:

Post a Comment