July 2006 Archives

I find it a little interesting that so shortly after reading this: rentzsch.com: Surviving I/O Errors, I had a similar experience.

I just purchased a second HD for my desktop machine (from 3btech after Jared pointed me to them and noted his good past experience both with ordering and using the drives in his backup machine).

My plan was to copy my $HOME and locally installed applications to the new disk, thus making things easier to backup (all of the must-save stuff will be on the new disk, so I can just image/rsync/whatever off of it). The old disk will have the OS and some other easily-restoreable stuff (like my darwinports installation).

Of course, things didn't end up being that easy.

My $HOME copied over fine, but I got I/O errors when trying to copy one of the directories inside the .app bundle of one of my applications.

I copied the rest over, and restored the application from the original install disk image.

Of course, I got I/O errors when trying to remove the directory from the now-unneeded application.

Comfortable in the knowledge that I had backed up everything I needed (so I could wipe and re-install, or junk the drive and re-install if necessary), I then proceeded to attempt to fix things 'in-place'.

Since I have smartmontools installed, I could easily get the LBA that was having problems.

From this (and pdisk), I calculated the seek I would need to give dd to have it write zeros to this block.

I then booted off of my Mac OS X 10.4 install DVD (into single user mode), forced dd to write zeros to this block, and ran fsck_hfs on the partition (with my fingers crossed).

It took a while to run, ran three times and aborted. I ran it again, it ran twice and claimed to have repaired the disk.

Cautiously optimistic, I booted from the now-repaired disk and had a look around. Seems like I was lucky (I found the missing directories from the .app bundle in lost+found, but didn't need them so I removed them).

smartctl -a /dev/disk0 doesn't show any new I/O errors, but I'll keep monitoring it (with smartd, and manually).

So far, though, so good.

Update: There was another nearby sector that was bad too (it was causing short and long SMART self tests to fail). Current_Pending_Sector and Offline_Uncorrectable both had a RAW_VALUE of 1 according to smartctl. Filling all the empty space on the drive with zeros (dd if=/dev/zero of=/tmp/hugefile bs=4096) seems to have fixed it (as both now have values of zero). Both the short and long tests pass now.

Curiously, Reallocated_Sector_Ct was 1 after I zeroed the first problem block (as expected) but it's now sitting at zero.

I'll have to continue to closely monitor the situation.

| 1 Comment

Recent Entries

Powered by Movable Type 4.34-en
Creative Commons License
This blog is licensed under a Creative Commons License.