November 24th, 2007

Friday afternoon I decided to install a package on one of my OpenBSD servers, but it was from a recent snapshot and the snapshot I was running on the server was too old to run it. No problem, I'll just upgrade the server. a usually quick task; just drop a new kernel into /, reboot, untar the new disk sets over /, run mergemaster and reboot again.

Remotely rebooting servers that are 350 miles away is always a nerve racking experience. You reboot it, your SSH connection drops, you start a ping waiting for it to reply as you visualize it booting up and thinking about how long each piece usually takes. Occasionally something takes longer than normal and you start to panic, but before you reach whoever you need to reach, it starts responding and suddenly a wave of relief comes over you and you resume your work.

But such was not the case on Friday. After 10 minutes, my server had still not come back up.

A frustrating 8 hours later, I finally had remote access to the server's console through the HP LOM interface to see why it wasn't coming back up. (Why it took 8 hours to get access to a leased server's embedded remote management console in a supposedly manned datacenter on a Friday afternoon is another story.) The machine was suddenly stalling booting at an ominous "No O/S" error. I assumed that the partition table or boot loader had somehow been corrupted, but not being able to boot to a CD or floppy on a server 350 miles away, I couldn't do much through the remote console.

Luckily a PXE boot server was available on the server's network, so I could at least try to boot to a bsd.rd image and try to fix the disk. Unluckily though, the LOM console suddenly decided to call it quits at that moment and was no longer responding on the network. Great.

Another 16 hours later (!) someone finally reconfigured the LOM and I had the server booted to a ramdisk image to try to rescue the data on the server. During the downtime I had eventually migrated some down services to other servers from backup data, but I prefer spending the time getting the original system back up than restoring a few-days-old backup on a new machine.

A quick look at the partition table and disklabel showed the problem; a mysteriously re-initialized disklabel:

# fdisk sd0
Disk: sd0       geometry: 9728/255/63 [156280320 Sectors]
Offset: 0       Signature: 0xAA55
         Starting       Ending       LBA Info:
 #: id    C   H  S -    C   H  S [       start:      size   ]
------------------------------------------------------------------------
 0: 00    0   0  0 -    0   0  0 [           0:           0 ] unused
 1: 00    0   0  0 -    0   0  0 [           0:           0 ] unused
 2: 00    0   0  0 -    0   0  0 [           0:           0 ] unused
*3: A6    0   1  1 - 9727 254 63 [          63:   156280257 ] OpenBSD

# disklabel sd0
# Inside MBR partition 3: type A6 start 63 size 156280257
[ ... ]
16 partitions:
#             size        offset  fstype [fsize bsize  cpg]
  a:     156280257            63  unused      0     0      # Cyl     0*- 76308*
  c:     156291072             0  unused      0     0      # Cyl     0 - 76313 

Normally I would have just restored the disklabel from an off-site backup of the server's /var/backups/disklabel.sd0.current file, but just to make things more interesting, I had no backup of /var/backups available.

Having been in a similar situation with my laptop years back, I remembered using the scan_ffs utility to rebuild the disklabel manually. scan_ffs scans the raw filesystem (/dev/rsd0c) looking for ffs partition info and spits out the block numbers as it progresses. However, scan_ffs isn't on the installation ramdisk and since the system disk is not accessible, it can't just be run from a mounted partition. But alas, it's an installation ramdisk, so it does have ifconfig, route and ftp, and with some manual ethernet interface configuration and deletion of some large files on the ramdisk's in-memory filesystem, scan_ffs joined the party:

# ftp -o - ftp://209.242.32.10/pub/OpenBSD/4.1/amd64/base41.tgz | \
    tar xvzf - ./sbin/scan_ffs
# /sbin/scan_ffs -l /dev/rsd0c
X: 4194240 47 4.2BSD 2048 16384 323 # /
X: 4194240 63 4.2BSD 2048 16384 323 # /
X: 1048576 5242880 4.2BSD 2048 16384 323 # /tmp
X: 4194304 6291456 4.2BSD 2048 16384 323 # /var
X: 8388608 10485760 4.2BSD 2048 16384 323 # /usr
X: 8388608 18874368 4.2BSD 2048 16384 323 # /home
X: 5760 25313664 4.2BSD 512 4096 80 # /mnt
X: 2880 25453408 4.2BSD 512 4096 80 # /mnt
X: 129017344 27262976 4.2BSD 2048 16384 323 # /d

And with those numbers, I recreated the disklabel:

[ ... ]
#             size        offset  fstype [fsize bsize  cpg]
  a:       4194240            63  4.2BSD   2048 16384   16 # Cyl     0*-  2047*
  b:       1048577       4194303    swap                   # Cyl  2047*-  2559 
  c:     156291072             0  unused      0     0      # Cyl     0 - 76313 
  d:       1048576       5242880  4.2BSD   2048 16384   16 # Cyl  2560 -  3071 
  e:       4194304       6291456  4.2BSD   2048 16384   16 # Cyl  3072 -  5119 
  f:       8388608      10485760  4.2BSD   2048 16384   16 # Cyl  5120 -  9215 
  g:       8388608      18874368  4.2BSD   2048 16384   16 # Cyl  9216 - 13311 
  h:     129017344      27262976  4.2BSD   2048 16384   16 # Cyl 13312 - 76308*

I reinstalled the boot blocks for good measure, rebooted, and was back in business after almost 30 hours of downtime. I'm still not sure how the on-disk disklabel vanished, though.

So the morals of the story are:

  1. When leasing a server from or co-locating with a 3rd party, make sure that the datacenter the server is in is actually staffed 24/7 even if the 3rd party company claims to be able to get access on-demand. If the 3rd party company can't get to the datacenter quickly, at least they can call in a reboot or remote hands request to the 24/7 datacenter staff.
  2. Before remotely rebooting a server, make sure that its remote console is actually reachable and usable. Or even check it at random since you might need it after an unexpected crash.
  3. Always have a backup of /var/backups available (for OpenBSD machines, anyway) even if the rest of the system isn't worth backing up. Hell, put shar /var/backups/\*.current | mail -s backups root in /etc/weekly.local and just archive the e-mails.

  4. The man page for scan_ffs does not lie:

    It is not perfect, and could do a lot more things with date/time information in the superblocks it finds, but this program has saved more than one butt, more than once.

Questions or comments?
Please feel free to contact me.