< SREJump to navigation Jump to search
List of automation tools and libraries, reporting and authoritative Web UIs used by the SRE team.
- Python/Wmflib: a Python package that contains custom modules to interact with the WMF production infrastructure, installable in any host.
- Cumin: parallel remote execution CLI tool and Python library with fine-grained target selection and output deduplication.
- Conftool: high level CLI tool and Python library to manage the live state stored in etcd. Load Balanced Services And Conftool has further information on scripts used to operate on clustered services.
- Dbctl: conftool extension to specifically manage MediaWiki's database configuration in etcd.
- Spicerack: centralized Python library to automate task, has modules to interact with most components of the infrastructure.
- Cookbooks: Python scripts that adhere to Spicerack's API and allow to automate tasks using Spicerack functionalities.
- Server Lifecycle/Reimage: fully automated OS (re)installation for physical hosts.
- Debdeploy: CLI tool to check and upgrade Debian packages.
- Homer: network devices configuration management CLI tool.
- WMFMariadbpy, work in progress library to automate administration of Wikimedia MySQL/MariaDB instances
Reporting Web UI
- icinga.w.o: central monitoring and alerting platform. See also Icinga.
- grafana.w.o: central observability platform. See also Grafana.
- OpenSearch Dashboards (a.k.a. logstash): central logging platform. See also Logstash.
- Server Admin Log (a.k.a. SAL): public log of all
!logactions on the
#wikimedia-operationsIRC channel. Also available at https://tools.wmflabs.org/sal/production.
- noc.w.o: publicly exposed configuration files and live state related to MediaWiki.
- puppetboard.w.o: PuppetDB API Web UI to inspect all the data stored in PuppetDB (hosts, facts, catalogs, reports of last runs).
- debmonitor.w.o: Debian package tracker website and CLI tool to track installed and upgradable packages. See also Debmonitor.
- tendril.w.o: database cluster analytics and performance Web UI. See also Tendril.
- librenms.w.o: Network-specific monitoring platform. See also LibreNMS.
- turnilo.w.o: Managed by Analytics but also used by SRE, especially the wmf_netflow and webrequest_sampled_128 datasets.