geeklair.net recovery

The short version: geeklair.net went down hard yesterday and I had to wipe+restore it. I had to restore user directories and websites from my April backup (from when I moved machines). I did have a more recent backup of the weblogs, so they are more up to date.

In an event, the websites are back up, movabletype is back up (but without the threaded comments patch, it may or may not make a comeback). And, most importantly email is back up.

Email is back up and running.

Before reconnecting an imap client, it might be wise to look through the local cache and save any messages that are important (as they'll be deleted when the imap client connects and sees the old mail boxes on the server).

Thanks to jared, I've also go the a new backup setup (where I'll have a complete copy of the system backed up nightly to a remote machine) in case this ever happens again.

Read on for more details.

Monday, I set up a new cron job to automatically generate the portindex for darwinports (this was the only major change to the machine I made that day).

On Tuesday morning, perhaps due to the above (and some time after a 4:30AM rsync of some of my important data for backups), the system crashed hard.

I decided to take this opportunity to also upgrade the machine to 10.4 (since I needed to go to the colo where the machine was to reboot it anyway).

Instead of doing the 'smart' thing and just connecting the available monitor and the keyboard/mouse that I brought to the machine to do the upgrade, I decided to be clever and boot the machine into FireWire target disk mode and do the install from my laptop.

Of course, I forgot that I had set up the machine as an AppleRAID software mirror, and that firewire target disk mode does 'reall bad' (tm) things to that setup.

So, the install looked like it went fine, but then the machine wouldn't boot. Connecting a monitor revealed that the RAID was degraded (because of target disk mode). Launching Disk Utility from the install DVD gave me the option to 'rebuild' it. Of course, about 10 seconds into rebuilding the array, it spit out an error.

At this point, I realized that I was probably going to loose data.

I attempted to rebuild the RAID from the command line, but it didn't work.

I broke the raid with the GUI, thinking I could mount the individual drives and at least salvage the data, but that was a mistake ... nothing would mount at all, then and you can't re-create the RAID without reformatting.

At this point, I took the machine out of the rack and back to my house.

I used pdisk to re-write the partition map to make the RAID partitions look like HFS+ partitions in the hope of mounting them. This didn't work, but it did allow DiskWarrior to see one of the disk's partitions. I ran DiskWarrior for a few hours and it looked like it was able to repair the disk. Unfortunately, it created most of the directory structure but will all of the files in a single "Recovered Items" folder in /.

At which point, I gave up and started rebuilding the machine.

I got it mostly ready by about 2:00 AM last night, and at 8:30AM this morning went and put it back in Voyager's (CoreComm's) co-lo room.

The rsync backup for the entire machine is running now (it'll probably take a couple of days for the first run) and will be running nightly after that, so this shouldn't happen again in the future.

My personal stuff was backed up, but I'm not sure about the other geeklair.net users, so I feel pretty bad about the whole thing.

Now I just have to go through all my email (I lost flags on my messages when I restored my mboxes from the Mail.app cached files) and finish setting up the things on the box that still aren't set up.

| 3 Comments

3 Comments

Nice job on the recovery. You had it back up and running pretty quickly.

Thanks, I'm still a bit miffed that I lost data, but I suppose I did the best I could with what I had.

I can understand that. Losing data is frustating as hell. Just recently I lost part of the backup on one application server due to a service failure. It just so happened that the data was NEEDED the next day due to ANOTHER issue. Pissed me right off.

Powered by Movable Type 4.34-en
Creative Commons License
This blog is licensed under a Creative Commons License.