Fedora Core 5 LV Disk recovery or When the proverbial S..t hits the fan

 

My case:

A 160G drive’s lose power cable caused the drive to eventually stop running. Some serious damage was caused to the file system. When I managed to get the drive back on line and run a fsck it took about one hour and half for all the corrections to be applied. I had to run fsck several times before I had a clean drive again.

All data was recovered.

 

Just to add a few things to the excellent article below:

I’m assuming a standard installation as set up by the automatic partitioning of the fedora 5 install and no Raid.

Understand that its not /dev/hda2 that you are going to be addressing but /dev/VolGroup00/LogVol00 which in your standard fedora 5 installation included /dev/hda2 and the swap device

Note that the article below was written for a rather older Fedora version so not everything is exactly as described but its close enough that you should be able to figure things out. If you cannot email me, skype me of message me.

email: Anthony.Dawson@thelasis.com 
Skype ID: aegdawson
ICQ ID: 23-227-727
Messenger: aegdawson
MSN: anthony.dawson@hotmail.com

Any fsck –yf should run against the LV so when you have done with your recovery process you will probably want to do a  clean up.

Also if the file system superblock is totally screwed you will need to use the –b option to select an alternative super block back up copy (of which there are many throughout the disk). If you don’t know where they are use:

mke2fs –n /dev/hda2 this will list a whole bunch of super block copies.

Apart from the rest, I had to issue a pvcreate command within the lvm environment in order to set up the LV table which I had previously deleted in ignorance…

Also note that the article renames the logical device so as not to conflict with the sane installation. I booted the linux system of the first installation disk in “linux recovery” mode and repaired the disk to the point where I could finally boot the system up again.

In my case the process was the following:

1)    boot off fedora boot disk press F5 and type in linux rescue

2)    Use parted to check that the partitions still exist

3)    fsck –yf /dev/hda1 note that the /dev/hda1 is not a LV

4)    Use dd to extract the volume information

1)    Use vi to edit the volume information and create VolGoup00 configuration file

2)    Use pvcreate with the right drive ID to label the volume with the correct uuid which I go from 2)

3)    Use vgcfgrestore to restore the volume description

4)    Use vgchange to make the volume active

5)    Use fsck –yf /dev/VolGroup00/LogVol00 to repair the file system

6)    If that fails use mke2fs –n /dev/hda2 to find the location of super block backup and then use

7)    Use fsck –yfb nnnn /dev/VolGroup00/LogVol00 where nnn is the inode of the alternative super block

8)    Reboot

 

If you’re lucky you’re back in business J

 

 

Recovery of RAID and LVM2 Volumes

By Richard Bullington-McGuire on Fri, 2006-04-28 01:00. Software

Raid and Logical Volume Managers are great, until you lose data.

The combination of Linux software RAID (Redundant Array of Inexpensive Disks) and LVM2 (Logical Volume Manager, version 2) offered in modern Linux operating systems offers both robustness and flexibility, but at the cost of complexity should you ever need to recover data from a drive formatted with software RAID and LVM2 partitions. I found this out the hard way when I recently tried to mount a system disk created with RAID and LVM2 on a different computer. The first attempts to read the filesystems on the disk failed in a frustrating manner.

I had attempted to put two hard disks into a small-form-factor computer that was really only designed to hold only one hard disk, running the disks as a mirrored RAID 1 volume. (I refer to that system as raidbox for the remainder of this article.) This attempt did not work, alas. After running for a few hours, it would power-off with an automatic thermal shutdown failure. I already had taken the system apart and started re-installing with only one disk when I realized there were some files on the old RAID volume that I wanted to retrieve.

Recovering the data would have been easy if the system did not use RAID or LVM2. The steps would have been to connect the old drive to another computer, mount the filesystem and copy the files from the failed volume. I first attempted to do so, using a computer I refer to as recoverybox, but this attempt met with frustration.

Why Was This So Hard?

Getting to the data proved challenging, both because the data was on a logical volume hidden inside a RAID device, and because the volume group on the RAID device had the same name as the volume group on the recovery system.

Some popular modern operating systems (for example, Red Hat Enterprise Linux 4, CentOS 4 and Fedora Core 4) can partition the disk automatically at install time, setting up the partitions using LVM for the root device. Generally, they set up a volume group called VolGroup00, with two logical volumes, LogVol00 and LogVol01, the first for the root directory and the second for swap, as shown in Listing 1.

Listing 1. Typical LVM Disk Configuration

[root@recoverybox ~]# /sbin/sfdisk -l /dev/hda

Disk /dev/hda: 39560 cylinders, 16 heads, 63 sectors/track

Warning: The partition table looks like it was made

  for C/H/S=*/255/63 (instead of 39560/16/63).

For this listing I'll assume that geometry.

Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System

/dev/hda1   *      0+     12      13-    104391   83  Linux

/dev/hda2         13    2481    2469   19832242+  8e  Linux LVM

/dev/hda3          0       -       0          0    0  Empty

/dev/hda4          0       -       0          0    0  Empty

[root@recoverybox ~]# /sbin/pvscan

  PV /dev/hda2   VG VolGroup00   lvm2 [18.91 GB / 32.00 MB free]

  Total: 1 [18.91 GB] / in use: 1 [18.91 GB] / in no VG: 0 [0   ]

[root@recoverybox ~]# /usr/sbin/lvscan

  ACTIVE            '/dev/VolGroup00/LogVol00' [18.38 GB] inherit

  ACTIVE            '/dev/VolGroup00/LogVol01' [512.00 MB] inherit

 

The original configuration for the software RAID device had three RAID 1 devices: md0, md1 and md2, for /boot, swap and /, respectively. The LVM2 volume group was on the biggest RAID device, md2. The volume group was named VolGroup00. This seemed like a good idea at the time, because it meant that the partitioning configuration for this box looked similar to how the distribution does things by default. Listing 2 shows how the software RAID array looked while it was operational.

Listing 2. Software RAID Disk Configuration

[root@raidbox ~]# /sbin/sfdisk -l /dev/hda

Disk /dev/hda: 9729 cylinders, 255 heads, 63 sectors/track

Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System

/dev/hda1   *      0+     12      13-    104391   fd  Linux raid

autodetect

/dev/hda2         13      77      65     522112+  fd  Linux raid

autodetect

/dev/hda3         78    9728    9651   77521657+  fd  Linux raid

autodetect

/dev/hda4          0       -       0          0    0  Empty

[root@raidbox ~]# cat /proc/mdstat

Personalities : [raid1]

md2 : active raid1 hdc3[1] hda3[1]

      77521536 blocks [2/2] [UU]

md1 : active raid1 hdc2[1] hda2[1]

      522048 blocks [2/2] [UU]

md0 : active raid1 hdc1[1] hda1[1]

      104320 blocks [2/2] [UU]

 

If you ever name two volume groups the same thing, and something goes wrong, you may be faced with the same problem. Creating conflicting names is easy to do, unfortunately, as the operating system has a default primary volume group name of VolGroup00.

Restoring Access to the RAID Array Members

To recover, the first thing to do is to move the drive to another machine. You can do this pretty easily by putting the drive in a USB2 hard drive enclosure. It then will show up as a SCSI hard disk device, for example, /dev/sda, when you plug it in to your recovery computer. This reduces the risk of damaging the recovery machine while attempting to install the hardware from the original computer.

The challenge then is to get the RAID setup recognized and to gain access to the logical volumes within. You can use sfdisk -l /dev/sda to check that the partitions on the old drive are still there.

To get the RAID setup recognized, use mdadm to scan the devices for their raid volume UUID signatures, as shown in Listing 3.

Listing 3. Scanning a Disk for RAID Array Members

[root@recoverybox ~]# mdadm --examine --scan  /dev/sda1 /dev/sda2 /dev/sda3

ARRAY /dev/md2 level=raid1 num-devices=2

 UUID=532502de:90e44fb0:242f485f:f02a2565

   devices=/dev/sda3

ARRAY /dev/md1 level=raid1 num-devices=2

 UUID=75fa22aa:9a11bcad:b42ed14a:b5f8da3c

   devices=/dev/sda2

ARRAY /dev/md0 level=raid1 num-devices=2