Raid and MegaCli
Raid setup at Wikimedia
RAID setups for database:
- raid-10
- 256k stripe
- writeback cache
- no read ahead
How to set up the initial raid group
Querying the RAID card
Get event logs
# installs megacli if not present and output logs to all.txt sudo apt install -qqy megacli && sudo megacli -AdpAliLog -a0>all.txt
Logical device info
Things to look for here:
- Stripe size
- Cache policy
root@db1047:/a/sqldata# megacli -LDInfo -Lall -Aall Adapter 0 -- Virtual Drive Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:1713408MB State: Optimal Stripe Size: 256kB Number Of Drives:2 Span Depth:6 Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 root@db1047:/a/sqldata#
Firmware, serial number, and boatloads of other default config
root@db1047:/a/sqldata# megacli -AdpAllInfo -aALL | less Adapter #0 ============================================================================== Versions ================ Product Name : PERC H700 Integrated Serial No : 11L006J FW Package Build: 12.10.0-0025 Mfg. Data ================ Mfg. Date : 01/22/11 Rework Date : 01/22/11 Revision No : A00 Battery FRU : N/A Image Versions In Flash: ================ BIOS Version : 3.18.00_4.09.05.00_0x0416A000 FW Version : 2.100.03-1046 Preboot CLI Version: 04.04-010:#%00008 Ctrl-R Version : 2.02-0025 Boot Block Version : 2.02.00.00-0000 MORE...
Icinga checks
As of 2017, check_raid
python script only checks the state of the controller (Optimal, failed, rebuilding, etc.), the logical disks and the pysical disk state:
sudo /usr/local/lib/nagios/plugins/check_raid [megacli] OK: optimal, 1 logical, 2 physical
An optional parameter was introduced to also check the write cache policy:
sudo /usr/local/lib/nagios/plugins/check_raid --policy WriteBack megacli OK: optimal, 1 logical, 2 physical, WriteBack policy
It will complain if for any reason the policy is different than the one given: /usr/local/lib/nagios/plugins/check_raid --policy WriteThrough CRITICAL: 1 LD(s) must have write cache policy WriteThrough, currently using: WriteBack
It does not have into account the state of the BBU, size and existence of the cache, etc.
It is enabled on puppet by setting the ::raid
class parameter $write_cache_policy
, normally set through profile::base::check_raid_policy