Nova Resource:Metricsinfra/SAL

From Wikitech

2024-03-13

  • 12:14 taavi: MariaDB [prometheusconfig]> delete from alerts where name = 'GridQueueProblem'; # T314664

2023-11-30

  • 18:53 taavi: no longer send quarry alerts to cloud services team

2023-11-18

  • 14:09 taavi: reboot metricsinfra-alertmanager-1 to see if it stops flapping a puppet alert

2023-09-29

  • 08:24 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
  • 08:17 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
  • 08:17 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
  • 08:16 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console

2023-05-10

  • 17:17 wm-bot2: Increased quotas by 8 cores, 16384 ram (T336423) - cookbook ran by taavi@runko

2023-05-04

  • 15:11 dcaro: rebooting metricsinfra-prometheus-2 as it was unresponsive

2023-04-24

  • 14:16 dcaro: rebooting metricsinfra-prometheus-2, it's in a non-responsive state (no ssh, console hangs)

2023-04-21

  • 21:58 andrewbogott: added raymond-ndibe as project member

2023-03-07

  • 16:31 wm-bot2: removed instance metricsinfra-controller-1 - cookbook ran by dcaro@vulcanus

2023-02-13

  • 23:37 bd808: metricsinfra-db-1.trove.eqiad1.wikimedia.cloud restarted via Horizon
  • 23:35 bd808: metricsinfra-db-1.trove.eqiad1.wikimedia.cloud not responsive to ssh
  • 23:32 bd808: grafana.wmcloud.org offline with db connection error. Investigating.

2022-12-20

  • 15:59 dcaro: rebooting prometheus-2 due to being non-responsive

2022-06-16

  • 14:18 taavi: add 'gitlab-runners' project to list of scraped projects

2022-03-01

  • 11:38 dcaro: Reloading alertmanager to refresh new config (T302702)
  • 11:37 dcaro: Adding runbook url annotation to GridQueueProblem alert on DB at metricsinfra-crontroller-1 (T302702)

2022-01-22

  • 11:32 taavi: added project-proxy VMs to prometheus targets

2021-12-14

  • 09:27 majavah: drop "analytics" project from current beta coverage, the setup is currently not compatible with pontoon

2021-09-11

  • 08:41 majavah: silence deployment-prep alerts yet again

2021-07-12

  • 15:45 bstorm: silenced deployment prep alerts for another 60 days

2021-06-15

  • 16:12 balloons: add 8 CPU/16G RAM to quota T284973

2021-06-14

  • 18:40 balloons: Add majavah as projectadmin T284938

2021-03-11

  • 18:33 bstorm: silenced alerts from deploymentprep for another 60 days

2021-01-04

  • 15:50 bstorm: silencing all alerts from deployment-prep for 60 more days

2020-09-29

  • 16:53 bstorm: silence all the deployment-prep alerts for another 30 days