This happens when prometheus has no data from k8s on the tekton-pipelines-controller pod.
Error / Incident
This usually comes in the form of an alert in alertmanager.
There you will get which project (tools, toolsbeta, ...) is the one it's failing for.
This is a tricky one and it will be related to the way we gather metrics on tools/toolsbeta.
Note that this is not directly related to the metricsinfra monitoring project, but toolforge's own setup.
You can start by going to the project's prometheus page and trying to get the stats there, example for tools:
Add new issues here when you encounter them!
Prometheus k8s cert expired
We don't have yet a way to autorefresh the certs prometheus uses to authenticate against k8s, so they need renewal. When they expire prometheus is not able to get metrics from it (so any k8s related metric will just not be there).
If that's the case, you can follow this guide.
- Karma UI for cloud vps (use
- Tools prometheus
- Toolsbeta prometheus
- Alerts repository
- Toolforge admin docs
Add any incident tasks here!