SRE/Observability/Ownership
< SRE | Observability
Backlog | Services | Description | Phabricator tag |
---|---|---|---|
Alerting | All things alerting, including AlertManager, Icinga and Splunk-On-Call | #observability-alerting | |
Metrics | Aggregatable metrics systems and their interfaces such as Prometheus, Thanos, Grafana Graphite | #observability-metrics | |
Logging | The log pipeline, logstash, and opensearch ecosystem | #observability-logging | |
Incident Tooling | Incident workflow-related tooling, such as dispatch and any other related systems. | #incident_tooling, #observability-ir-tools | |
Tracing | This is not developed yet but is in future plans, distributed tracing support. | #observability-tracing | |
Prometheus | Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting. | ||
Graphite | Graphite is a real-time time series data store and graph renderer. | https://phabricator.wikimedia.org/tag/graphite/ | |
Alertmanager | Alertmanager is the service (and software) in charge of collecting, de-duplicating and sending notifications for alerts across WMF infrastructure. It is part of the Prometheus ecosystem and therefore Prometheus itself has native support to act as Alertmanager client. The alerts dashboard, implemented by Karma, can be reached at https://alerts.wikimedia.org/ |