User:Jhedden/notes/wmcs-monitoring

From Wikitech

Current Prometheus

Cloudmetrics

Services:

  • Carbon frontend, cache and relay
  • Prometheus blackbox exporter
  • Prometheus server

Dashboards:

  • Grafana
  • Graphite
  • Prometheus

Prometheus targets [0]:

host prefix exporters
cloudcephmon ceph_eqiad
cloudcontrol blackbox_http, openstack, rabbitmq
cloudnet blackbox_http
cloudservices blackbox_http, pdns, pdns_rec
proxy-eqiad blackbox_http
tools-redis redis_toolforge (down status, old config?)

Production

Services:

  • Prometheus server

Prometheus targets [1]:

host prefix exporters
cloudcephmon node, rsyslog
cloudcephosd node, rsyslog
cloudcontrol haproxy, memcached, node, rsyslog
cloudmetrics envoy, node, rsyslog
cloudnet node, rsyslog
cloudservices memcached, node, rsyslog
cloudstore node, rsyslog
cloudvirt node, rsyslog
cloudvirt-wdqs node, rsyslog
labsdb mysql-labs, node, rsyslog
labstore node, rsyslog
labweb envoy, memcached, node, nutcracker, rsyslog

Tools

Services:

  • Prometheus server

Prometheus targets [2]:

host prefix exporters
clouddb toolsdb-mariadb, toolsdb-node
k8s.tools.eqiad new-k8s-api, new-k8s-cadvisor, new-k8s-haproxy, new-k8s-ingress-nginx, new-k8s-kube-state-metrics
tools-acme-chief node, ssh_banner
tools-checker node, ssh_banner
tools-clushmaster node, ssh_banner
tools-docker-imagebuilder node, ssh_banner
tools-docker-registry node, ssh_banner
tools-elastic node, ssh_banner
tools-k8s-control new-k8s-nodes, node, ssh_banner
tools-k8s-etcd etcd, node, ssh_banner
tools-k8s-worker node, ssh_banner
tools-mail node, ssh_banner
tools-package-builder node, ssh_banner
tools-paws-master node, ssh_banner
tools-paws-worker node, ssh_banner
tools-prometheus node, ssh_banner
tools-proxy frontproxy-nginx, node, ssh_banner
tools-puppetdb node, ssh_banner
tools-puppetmaster node, ssh_banner
tools-redis node, ssh_banner
tools-sgebastion node, ssh_banner
tools-sgecron node, ssh_banner
tools-sgeexec node, ssh_banner
tools-sgegrid-master node, ssh_banner
tools-sgegrid-shadow node, ssh_banner
tools-sge-services node, ssh_banner
tools-sgewebgrid-generic node, ssh_banner
tools-sgewebgrid-lighttpd node, ssh_banner
tools-static node, ssh_banner


sources

[0] Cloud metrics

curl -s http://localhost:9900/labs/api/v1/targets | jq -r \
 '.data.activeTargets
 | group_by(.labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)"))
 | map({key:(.[0].labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)")),
        value: (map(.labels.job)| unique | join(", "))})
 | from_entries'

[1] Production

curl -s http://localhost:9900/ops/api/v1/targets | jq -r \
 '.data.activeTargets
 | group_by(.labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)"))
 | map({key:(.[0].labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)")),
        value: (map(.labels.job)| unique | join(", "))})
 | from_entries' | egrep "cloud|labs" | grep -v elastic

[2] Tools

curl -s http://localhost:9902/tools/api/v1/target | jq -r \
 '.data.activeTargets
 | group_by(.labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)"))
 | map({key:(.[0].labels.instance | sub("(?<h>[a-z][^0-9]+).*$";"\(.h)")),
        value: (map(.labels.job)| unique | join(", "))})
 | from_entries' | sort

Future options

VM Multi-tenancy

The Prometheus OpenStack SD config can dynamically inventory scrap targets from OpenStack. Each instance is tagged with a project ID that can be used by the alert manager to direct targets to specific users.

Federation

https://prometheus.io/docs/prometheus/latest/federation/