SRE/Dc-operations
SRE Data Center Operations
DC Operations | About | Projects & Workboards | IRC: #wikimedia-dcops connect
HW Troubleshooting | HW Specific Documentation
This landing page will serve as an index for DC Ops team specific Wikitech pages. In addition to this landing page, Data Center Operations maintains a landing page in Phabricator.
SLAs
We make every attempt to resolve all tasks and requests in a timely manner. We've implemented the following SLA targets.
Please note none of these start until both the clarified start time and with proper project tags. See details for each type of task request in their section below. Please use templates listed below.
Project | Business Days to Resolve | SLA start |
---|---|---|
Procurement | 90 | Date of Task filing |
Racking/Installation | 30 | Arrival of Hardware to DC site |
Hardware Failure / Repair | 10 | Date of Task filing |
Decommission | 45 | When all sub-team steps are complete and task is assigned to on-site |
Other Info
Common Data center Specifications
SRE/Infrastructure Naming Conventions - shared SRE Department document of host-name standards.
Insetup roles per team can be browsed from the repo directy via: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/role/manifests/insetup/
Examples (not a complete list, included for formatting example)
- Data Engineering: role::insetup::data_engineering
- Machine Learning: role::insetup::machine_learning
Runbooks
- Hardware Troubleshooting Runbook
- Provision cookbook troubleshooting
- Re-image cookbook troubleshooting
- Securely Erasing Media
- Rebuilding SW Raid with SGDISK & MDADM
- Platform Specific Documentation - vendor specific hardware details (Dell, HP, OpenGear, ServerTech)