How to replace a Volume in mdadm software RAID array

May 4, 2022 Linux, Storage

replace a Volume in mdadm software RAID array
mdadm, which is a Linux Software RAID, is not inferior to a typical hardware RAID controller, and just like a hardware controller – enables us to swap physical disks inside the RAID array. It requires executing some commands indeed, but the whole process still seems to be pretty straightforward.

Usually, we replace a disk in RAID when it starts failing, but there might be scenarios, where you just want to swap mechanical SATA disks in RAID with SSDs, one by one, for better performance, without reinstalling the whole OS.

Our RAID1 array is based on two physical disks: /dev/sda and /dev/sdb, it consists of two personalities: /dev/md126 and /dev/md127:

[root@fixxxer ~]# cat /proc/mdstat
Personalities : [raid1] 
md126 : active raid1 sdb2[1] sda2[2]
      487252992 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md127 : active raid1 sdb1[1] sda1[2]
      999424 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

Personality /dev/md126 consists of two corresponding partitions: /dev/sda2 and /dev/sdb2 and it creates an LVM Physical Volume:

[root@fixxxer ~]# pvs
  PV         VG     Fmt  Attr PSize    PFree
  /dev/md126 fedora lvm2 a--  464,68g    0

Personality /dev/md127 consists of two corresponding standard partitions: /dev/sda1, /dev/sdb1 and is mounted in the system as a /boot directory:

[root@fixxxer ~]# df -hT | grep boot
/dev/md127              xfs       973M  294M  679M  31% /boot

In this tutorial, we are replacing the /dev/sdb drive in our RAID1 array (mirror), with the new disk, and rebuilding the GRUB Bootloader on the new disk.

Steps:

1. Mark disk partitions as failed

Because /dev/sdb disk includes two partitions, involved in two RAID personalities, we need to set both partitions as FAILED in order to be able to remove the disk from the array:

[root@fixxxer ~]# mdadm --manage /dev/md126 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md126
[root@fixxxer ~]# mdadm --manage /dev/md127 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md127

Both partitions from /dev/sdb are now marked with the F letter and the array is degraded state:

[root@fixxxer ~]# cat /proc/mdstat
Personalities : [raid1] 
md126 : active raid1 sdb2[1](F) sda2[2]
      487252992 blocks super 1.2 [2/1] [U_]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md127 : active raid1 sdb1[1](F) sda1[2]
      999424 blocks super 1.2 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

2. Remove disk partitions from RAID personalities

[root@fixxxer ~]# mdadm --manage /dev/md126 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md126
[root@fixxxer ~]# mdadm --manage /dev/md127 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md127

3. Replace the disk
Replace /dev/sdb disk with the new one.

4. Create a partition table on the new disk

Copy the partition table from the existing disk /dev/sda to the new one, that is /dev/sdb:

[root@fixxxer ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb

4. Recreate RAID1 mirrors using new disk partitions

Now add the newly created partitions /dev/sdb1 and /dev/sdb2 to the corresponding RAID 1 personalities, that is /dev/md127 and /dev/md126:

[root@fixxxer ~]# mdadm --manage /dev/md126 --add /dev/sdb2
mdadm: added /dev/sdb2
[root@fixxxer ~]# mdadm --manage /dev/md127 --add /dev/sdb1
mdadm: added /dev/sdb1

Warning: be careful – if the disk replacement required powering your computer off and on, then the drive letters of your partitions could have changed and you will have to alter the above commands accordingly.


5. Verify RAID1 mirror status

After we have added new partitions to the existing personalities /dev/md126 and /dev/md127, the RAID arrays will start to rebuild:

[root@fixxxer ~]# cat /proc/mdstat
Personalities : [raid1] 
md126 : active raid1 sdb2[3] sda2[2]
      999424 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdb1[3] sda1[2]
      487252992 blocks super 1.2 [2/1] [U_]
      [=====>...............]  recovery = 26.7% (130184064/487252992) finish=93.9min speed=63315K/sec
      bitmap: 4/4 pages [16KB], 65536KB chunk

Rebuilding takes a while, depending on the type of both disks. After it has been accomplished, verify the status of both RAID personas:

[root@fixxxer ~]# mdadm --detail /dev/md126
/dev/md126:
           Version : 1.2
     Creation Time : Sun Sep 15 02:36:11 2019
        Raid Level : raid1
        Array Size : 487252992 (464.68 GiB 498.95 GB)
     Used Dev Size : 487252992 (464.68 GiB 498.95 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu May  5 19:14:46 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : localhost-live:pv00
              UUID : e802093f:ef7988e5:4337bde0:662bf872
            Events : 73263

    Number   Major   Minor   RaidDevice State
       2       8        2        0      active sync   /dev/sda2
       3       8       18        1      active sync   /dev/sdb2
[root@fixxxer ~]# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Sun Sep 15 02:35:57 2019
        Raid Level : raid1
        Array Size : 999424 (976.00 MiB 1023.41 MB)
     Used Dev Size : 999424 (976.00 MiB 1023.41 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu May  5 18:55:49 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : localhost-live:boot
              UUID : b9680eb2:2139f4ae:e099a845:e12657da
            Events : 232

    Number   Major   Minor   RaidDevice State
       2       8        1        0      active sync   /dev/sda1
       3       8       17        1      active sync   /dev/sdb1

6. Recreate GRUB Bootloader on the new disk

Both disks in our RAID1 array, that is /dev/sda and /dev/sdb, are bootable, so each one of them should include GRUB Bootloader. That is why, we need to install GRUB Bootloader on the new disk, in order to be able to boot from it, in case the first disk has failed:

[root@fixxxer ~]# grub2-install /dev/sdb

After GRUB installation, it is good to temporarily change the disk boot priority in BIOS and try to boot from /dev/sdb, just in case, to see if it is going to work in the future.


Troubleshooting

On CentOS 7 you may encounter an error trying to re-install GRUB after replacing the disk ( for example /dev/sda ) in mdadm RAID array (I haven’t noticed this issue on other distros so far):

[root@fixxxer ~]# grub2-install /dev/sda
Installing for i386-pc platform.
grub2-install: error: disk `mduuid/13a75c2dd7275e4aedb6fb4acd9b9f7a' not found.

The possible solution to fix it is reassembling the RAID persona which holds a /boot directory.

Steps:

1. Find out which RAID persona holds your /boot directory

[root@fixxxer ~]# df -hT | grep /boot
/dev/md127              xfs       973M  294M  679M  31% /boot

2. Find out what partitions are included in the persona

[root@fixxxer ~]# cat /proc/mdstat
Personalities : [raid1] 
md126 : active raid1 sda2[2] sdb2[3]
      487252992 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md127 : active raid1 sda1[2] sdb1[3]
      999424 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: 

3. Unmount /boot directory

[root@fixxxer ~]# umount /boot

4. Stop RAID persona

[root@fixxxer ~]# mdadm --stop /dev/md127

5. Reassemble back your RAID persona using the same partitions

[root@fixxxer ~]# mdadm --assemble --run /dev/md127 /dev/sda1 /dev/sdb1

6. Install GRUB on both disks involved in the persona

[root@fixxxer ~]# grub2-install /dev/sda
[root@fixxxer ~]# grub2-install /dev/sdb

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.