Jump to content

SRE/Dc-operations/Sw raid rebuild directions

From Wikitech

When a defective disk is swapped out on a sw raid, it is not automatically rebuilt. Rebuilding requires adding the new disk in with the following procedure (uses wdqs2007 disk replacement from this task as an example):

check to see if the new disk is detected along with the existing disks:

sudo lshw -class disk

copy the parition table of sda to sdh (sdh was replaced)

sudo sgdisk -R /dev/sdh /dev/sda

create a random guid for sdh

sudo sgdisk -G /dev/sdh

audit output of both disks to ensure they now match

sudo sgdisk -p /dev/sda

sudo sgdisk -p /dev/sdh

add the new SSD back into the array

sudo mdadm --manage /dev/md0 --add /dev/sdh2

In some cases, mdraid may automatically create and add the new disk to a separate sw raid. This will manifest with the above command returning

mdadm: Cannot open /dev/sdh2: Device or resource busy

as well as /proc/mdstat showing an inactive raid that isn't md0, for instance

cat /proc/mdstat 
Personalities : [raid1] [linear]  [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md127 : inactive sda2[7](S)
     937267200 blocks super 1.2
      
md0 : active raid1 sdb2[1]
     937267200 blocks super 1.2 [2/1] [_U]
     bitmap: 6/7 pages [24KB], 65536KB chunk

In that case, copy the inactive raid id and stop it

sudo mdadm --manage /dev/md127 --stop then retry adding the device to /dev/md0

check the status

robh@wdqs2007:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid10 sdh2[8] sda2[0] sde2[4] sdf2[5] sdg2[6] sdc2[2] sdd2[3] sdb2[1]
3749068800 blocks super 1.2 512K chunks 2 near-copies [8/7] [UUUUUUU_]
[>....................]  recovery =  0.0% (132736/937267200) finish=470.6min speed=33184K/sec
bitmap: 28/28 pages [112KB], 65536KB chunk
unused devices: <none>