PERCCli
Later models (2022) versions of the Dell PowerEdge servers have migrated from the MegaRAID controller and the MegaCli tool to PERC6 (PowerEdge RAID Controller) cards. Management of these RAID cards is done via Dells perccli and perccli64 tool. We will be using the perccli64, which is the 64 bit version.
The full manual from Dell for the perccli tool is available here: Dell EMC PowerEdge RAID Controller Command Line Interface
Installation
The perccli64 tool is installed by Puppet on the relevant servers. The servers with the PERC cards are identified by the PCI ID of those cards. As we do not have permission to redistribute the Dell software, the DEB packages are only available via our private APT repository (See Reprepro of information on the private repo).
Monitoring
Integration into monitoring is handled by the get-raid-status-perccli script. This can also be manually run using:
$ sudo /usr/local/lib/nagios/plugins/get-raid-status-perccli
The perccli64 tool supports exporting data to JSON, using the "J" option after each command. E.g. the following command will list all installed RAID controllers as JSON:
$ sudo perccli64 show all J
To use the perccli64 tool to locate any errors, first find the available controllers and their IDs. We will typically only have one controller per server, make the ID always be 0. Verify that this is the case using:
$ sudo perccli64 show all CLI Version = 007.1910.0000.0000 Oct 08, 2021 Operating system = Linux 5.10.0-17-amd64 Status Code = 0 Status = Success Description = None Number of Controllers = 1 Host Name = dumpsdata1007 Operating System = Linux 5.10.0-17-amd64 System Overview : =============== --------------------------------------------------------------------------- Ctl Model Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth --------------------------------------------------------------------------- 0 PERCH750Adapter 8 14 2 0 2 0 Opt On - N 0 Opt --------------------------------------------------------------------------- ...
Battery status
To view the state of the BBU (Battery Backup Unit) run the following command, where /c0 is controller 0 (the first controller)
$ sudo perccli64 /c0/bbu show status CLI Version = 007.1910.0000.0000 Oct 08, 2021 Operating system = Linux 5.10.0-17-amd64 Controller = 0 Status = Success Description = None BBU_Info : ======== ---------------------- Property Value ---------------------- Type BBU Voltage 3938 mV Current 0 mA Temperature 36 C Battery State Optimal ---------------------- ...
Relearn cycle
A BBU learn cycle means that the battery will be fully discharged, and recharged, to allow the controller to become aware of reduced battery capacity over time. After a cycle the controller will update with the new information about the new capacity of the BBU. A cycle may take in excess of 24 hours.
Check that a cycle in not currently running, or see when the next cycle will start automatically:
$ sudo perccli64 /c0/bbu show learn ... BBU Learn : ========= ----------------------------------------------------- Property Value ----------------------------------------------------- Auto Learn Mode Transparent Schedule Time SUN, October 30, 2022 at 14:38:57 Interval 12 Weeks 6 Days Learn Cycle Active No -----------------------------------------------------
To force a relearning cycle run:
$ sudo perccli /c0/bbu start learn
Virtual disk (array) status
To list all virtual disk (RAID arrays) and their status run:
$ sudo perccli64 /c0/vall show Virtual Drives : ============== ---------------------------------------------------------------- DG/VD TYPE State Access Consist Cache Cac sCC Size Name ---------------------------------------------------------------- 1/238 RAID1 Optl RW Yes RWBD - OFF 446.625 GB 0/239 RAID10 Optl RW Yes RWBD - OFF 43.661 TB ----------------------------------------------------------------
The information above indicates that we have two disk groups (DG), 0 and 1, also known as virtual disks (VD) 238 and 239. Both are currently "Optimal" (Optl).
More details on the individual VDs can be had by running the following, where v238 is virtual disk 238, the 446GB array show by the previous command:
$ sudo perccli64 /c0/v238 show all ... PDs for VD 238 : ============== ----------------------------------------------------------------------------- EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type ----------------------------------------------------------------------------- 64:12 0 Onln 1 446.625 GB SATA SSD N N 512B HFS480G3H2X069N U - 64:13 4 Onln 1 446.625 GB SATA SSD N N 512B HFS480G3H2X069N U - ----------------------------------------------------------------------------- ... VD238 Properties : ================ Strip Size = 512 KB Number of Blocks = 936640512 Span Depth = 1 Number of Drives Per Span = 2 Write Cache(initial setting) = WriteBack Disk Cache Policy = Disk's Default Encryption = None Data Protection = None Active Operations = None Exposed to OS = Yes OS Drive Name = /dev/sda Creation Date = 29-06-2022 Creation Time = 04:55:14 PM Emulation type = default Cachebypass size = Cachebypass-64k Cachebypass Mode = Cachebypass Intelligent Is LD Ready for OS Requests = Yes SCSI NAA Id = 670b5e80fe06a9002a4f4072aa6e59b2 Unmap Enabled = N/A
Among other useful information this helps identify the disk, in this case /dev/sda, and the physical devices used to construct the virtual disk, as well as their state.
Physical drives
To identify a and debug physical drive we need to be able to identify and locate the physical devices. The following command will output a lot of information, but we mainly care about the topology (If we don't already know the layout of the virtual disks) and the drive list.
$ sudo perccli64 /c0/dall show all ... ------------------------------------------------------------------------------ DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR ------------------------------------------------------------------------------ 0 - - - - RAID10 Optl N 43.661 TB dflt N N dflt N N 0 0 - - - RAID1 Optl N 43.661 TB dflt N N dflt N N 0 0 0 64:0 1 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 1 64:1 2 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 2 64:2 6 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 3 64:3 7 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 4 64:4 5 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 5 64:5 9 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 6 64:6 3 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 7 64:7 12 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 8 64:8 8 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 9 64:9 11 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 10 64:10 10 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 11 64:11 13 DRIVE Onln N 7.276 TB dflt N N dflt - N 1 - - - - RAID1 Optl N 446.625 GB dflt N N dflt N N 1 0 - - - RAID1 Optl N 446.625 GB dflt N N dflt N N 1 0 0 64:12 0 DRIVE Onln N 446.625 GB dflt N N dflt - N 1 0 1 64:13 4 DRIVE Onln N 446.625 GB dflt N N dflt - N ------------------------------------------------------------------------------ ... DG Drive LIST : ============= ---------------------------------------------------------------------------------- EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type ---------------------------------------------------------------------------------- 64:0 1 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:1 2 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:2 6 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:3 7 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:4 5 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:5 9 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:6 3 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:7 12 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:8 8 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:9 11 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:10 10 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:11 13 Onln 0 7.276 TB SATA HDD N N 512B TOSHIBA MG06ACA800EY U - 64:12 0 Onln 1 446.625 GB SATA SSD N N 512B HFS480G3H2X069N U - 64:13 4 Onln 1 446.625 GB SATA SSD N N 512B HFS480G3H2X069N U - ----------------------------------------------------------------------------------
Like with the controllers there will typically only be one enclosure, and we just need to get the correct ID, in this case EID 64.
Replace a drive
To identify the failed disk (in this example disk 12) in the enclosure (enclosure 64) and stop the disk, run:
$ sudo perccli64 /c0[/e64]/s12 start locate $ sudo perccli64 /c0/e64/s12 spindown
This will light up the indicator LED for that drive. To do the reverse use "stop locate" and "spinup".
Verify that the virtual disk is rebuilding after drive replacement (where the virtual disk is ID 238 on controller 0):
$ sudo perccli64 /c0/v238 show