SRE/Dc-operations/Sw raid rebuild directions

From Wikitech
Jump to navigation Jump to search
Wikimedia infrastructure


When a defective disk is swapped out on a sw raid, it is not automatically rebuilt. Rebuilding requires adding the new disk in with the following procedure (uses wdqs2007 disk replacement from this task as an example):

check to see if the new disk is detected along with the existing disks:

sudo lshw -class disk

copy the parition table of sda to sdh (sdh was replaced)

sudo sgdisk -R /dev/sdh /dev/sda

create a random guid for sdh

sudo sgdisk -G /dev/sdh

audit output of both disks to ensure they now match

sudo sgdisk -p /dev/sda

sudo sgdisk -p /dev/sdh

add the new SSD back into the array

sudo mdadm --manage /dev/md0 --add /dev/sdh2

check the status

robh@wdqs2007:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid10 sdh2[8] sda2[0] sde2[4] sdf2[5] sdg2[6] sdc2[2] sdd2[3] sdb2[1]
3749068800 blocks super 1.2 512K chunks 2 near-copies [8/7] [UUUUUUU_]
[>....................]  recovery =  0.0% (132736/937267200) finish=470.6min speed=33184K/sec
bitmap: 28/28 pages [112KB], 65536KB chunk
unused devices: <none>