Jump to content

Raid and MegaCli

From Wikitech

Raid setup at Wikimedia

RAID setups for database:

  • raid-10
  • 256k stripe
  • writeback cache
  • no read ahead

How to set up the initial raid group

Querying the RAID card

Get event logs

# installs megacli if not present and output logs to all.txt
sudo apt install -qqy megacli && sudo megacli -AdpAliLog -a0>all.txt

Logical device info

Things to look for here:

  • Stripe size
  • Cache policy
root@db1047:/a/sqldata# megacli -LDInfo -Lall -Aall
                                     

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:1713408MB
State: Optimal
Stripe Size: 256kB
Number Of Drives:2
Span Depth:6
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

Exit Code: 0x00
root@db1047:/a/sqldata# 

Firmware, serial number, and boatloads of other default config

root@db1047:/a/sqldata# megacli -AdpAllInfo -aALL | less

Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : PERC H700 Integrated
Serial No       : 11L006J
FW Package Build: 12.10.0-0025

                    Mfg. Data
                ================
Mfg. Date       : 01/22/11
Rework Date     : 01/22/11
Revision No     : A00   
Battery FRU     : N/A   

                Image Versions In Flash:
                ================
BIOS Version       : 3.18.00_4.09.05.00_0x0416A000
FW Version         : 2.100.03-1046
Preboot CLI Version: 04.04-010:#%00008
Ctrl-R Version     : 2.02-0025
Boot Block Version : 2.02.00.00-0000

MORE...

Icinga checks

As of 2017, check_raid python script only checks the state of the controller (Optimal, failed, rebuilding, etc.), the logical disks and the pysical disk state:

sudo /usr/local/lib/nagios/plugins/check_raid [megacli]
OK: optimal, 1 logical, 2 physical

An optional parameter was introduced to also check the write cache policy:

sudo /usr/local/lib/nagios/plugins/check_raid --policy WriteBack megacli
OK: optimal, 1 logical, 2 physical, WriteBack policy

It will complain if for any reason the policy is different than the one given: /usr/local/lib/nagios/plugins/check_raid --policy WriteThrough CRITICAL: 1 LD(s) must have write cache policy WriteThrough, currently using: WriteBack

It does not have into account the state of the BBU, size and existence of the cache, etc.

It is enabled on puppet by setting the ::raid class parameter $write_cache_policy, normally set through profile::base::check_raid_policy

See Also