SRE/SRE tooling
Appearance
< SRE
List of automation tools and libraries, reporting and authoritative Web UIs used by the SRE team.
Tools
- Python/Wmflib: a Python package that contains custom modules to interact with the WMF production infrastructure, installable in any host.
- Cumin: parallel remote execution CLI tool and Python library with fine-grained target selection and output deduplication.
- Conftool: high level CLI tool and Python library to manage the live state stored in etcd. Load Balanced Services And Conftool has further information on scripts used to operate on clustered services.
- Dbctl: conftool extension to specifically manage MediaWiki's database configuration in etcd.
- requestctl: conftool extension to control access and routing of web requests
- Spicerack: centralized Python library to automate task, has modules to interact with most components of the infrastructure.
- Cookbooks: Python scripts that adhere to Spicerack's API and allow to automate tasks using Spicerack functionalities.
- Server Lifecycle/Reimage: fully automated OS (re)installation for physical hosts.
- Debdeploy: CLI tool to check and upgrade Debian packages.
- Homer: network devices configuration management CLI tool.
- WMFMariadbpy, work in progress library to automate administration of Wikimedia MySQL/MariaDB instances
- Country latency measurement: tool to measure per-country latency via RIPE Atlas.
Reporting Web UI
- icinga.w.o: central monitoring and alerting platform. See also Icinga.
- alerts.w.o real-time Alertmanager frontend. See also Prometheus.
- grafana.w.o: central observability platform. See also Grafana.
- OpenSearch Dashboards (a.k.a. logstash): central logging platform. See also Logstash.
- Server Admin Log (a.k.a. SAL): public log of all
!log
actions on the#wikimedia-operations
IRC channel. Also available at toolforge:sal/production. - noc.w.o: publicly exposed configuration files and live state related to MediaWiki.
- puppetboard.w.o: PuppetDB API Web UI to inspect all the data stored in PuppetDB (hosts, facts, catalogs, reports of last runs).
- debmonitor.w.o: Debian package tracker website and CLI tool to track installed and upgradable packages. See also DebMonitor.
- orchestrator.w.o: database cluster analytics and performance Web UI. See also Orchestrator.
- librenms.w.o: Network-specific monitoring platform. See also LibreNMS.
- turnilo.w.o: Managed by Analytics but also used by SRE, especially the
wmf_netflow
andwebrequest_sampled_live
datasets. - Superset dashboards for webrequest sampled.
Authoritative Web UI
- netbox.w.o: data center infrastructure management (DCIM) and IP address management (IPAM) Web UI, a.k.a. "the source of truth". See also Netbox.