SRE/Dc-operations/Sw raid rebuild directions
SRE Data Center Operations
DC Operations | About | Projects & Workboards | IRC: #wikimedia-dcops connect
HW Troubleshooting | HW Specific Documentation
When a defective disk is swapped out on a sw raid, it is not automatically rebuilt. Rebuilding requires adding the new disk in with the following procedure (uses wdqs2007 disk replacement from this task as an example):
check to see if the new disk is detected along with the existing disks:
sudo lshw -class disk
copy the parition table of sda to sdh (sdh was replaced)
sudo sgdisk -R /dev/sdh /dev/sda
create a random guid for sdh
sudo sgdisk -G /dev/sdh
audit output of both disks to ensure they now match
sudo sgdisk -p /dev/sda
sudo sgdisk -p /dev/sdh
add the new SSD back into the array
sudo mdadm --manage /dev/md0 --add /dev/sdh2
check the status
robh@wdqs2007:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md0 : active raid10 sdh2[8] sda2[0] sde2[4] sdf2[5] sdg2[6] sdc2[2] sdd2[3] sdb2[1] 3749068800 blocks super 1.2 512K chunks 2 near-copies [8/7] [UUUUUUU_] [>....................] recovery = 0.0% (132736/937267200) finish=470.6min speed=33184K/sec bitmap: 28/28 pages [112KB], 65536KB chunk unused devices: <none>