Server Admin Log/Archive 61

From Wikitech

2022-12-31

2022-12-30

  • 21:36 dcausse: restarting blazegraph on wdqs1006 and wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)

2022-12-29

  • 23:26 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:25 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 23:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam
  • 09:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam

2022-12-22

  • 18:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 18:27 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:16 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1015.eqiad.wmnet with reason: host reimage
  • 18:00 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1015.eqiad.wmnet with reason: host reimage
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas ulsfo decom - robh@cumin2002"
  • 17:24 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas ulsfo decom - robh@cumin2002"
  • 17:22 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 16:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 16:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:23 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool inference in codfw: maintenance
  • 16:18 elukey@cumin1001: START - Cookbook sre.discovery.service-route pool inference in codfw: maintenance
  • 16:17 elukey@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool inference in eqiad: maintenance
  • 16:17 elukey@cumin1001: START - Cookbook sre.discovery.service-route depool inference in eqiad: maintenance
  • 16:15 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool inference in codfw: maintenance
  • 16:10 elukey@cumin1001: START - Cookbook sre.discovery.service-route depool inference in codfw: maintenance
  • 16:09 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check inference: maintenance
  • 16:09 elukey@cumin1001: START - Cookbook sre.discovery.service-route check inference: maintenance
  • 16:00 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 14:47 akosiaris: truncate daemon.log.1 on maps1009 to free up disk space
  • 14:43 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 14:42 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 13:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet with OS bullseye
  • 13:18 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:49 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1014.eqiad.wmnet with reason: host reimage
  • 12:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1014.eqiad.wmnet with reason: host reimage
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1014.eqiad.wmnet with OS bullseye
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet with OS bullseye
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 11:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17806
  • 11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17806
  • 11:09 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 10:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1013.eqiad.wmnet with reason: host reimage
  • 10:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1013.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1013.eqiad.wmnet with OS bullseye
  • 08:11 moritzm: installing libksba security updates
  • 07:44 vgutierrez: restarting varnish on cp4052 to clear VarnishChildRestarted alert - T325797
  • 07:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56286
  • 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56286
  • 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17806
  • 07:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17806
  • 07:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20485
  • 07:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 07:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28398
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 28398
  • 01:54 mbsantos: Update verbiage of fundraising banner (T325690)
  • 01:53 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 01:52 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 01:52 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 01:51 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 01:50 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 01:50 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply

2022-12-21

  • 23:41 ejegg: civicrm upgraded from d80f9550 to e3405a4e
  • 20:34 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:33 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 moritzm: installing nano bugfix updates from Bullseye point release
  • 13:32 moritzm: installing node-minimatch security updates
  • 12:02 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=1; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet,service=canary
  • 11:50 moritzm: instaling libde265 security updates
  • 11:45 moritzm: installing libbluray bugfix update for buster
  • 11:40 moritzm: installing joblib security updates
  • 11:11 moritzm: installing php7.3 security updates on buster
  • 09:55 jayme: scaling shellbox-syntaxhighlight back to 12 replicas
  • 09:09 jayme: correction: increasing replicas of shellbox-syntaxhighlight from 12 to 40
  • 09:08 jayme: increasing replicas of shellbox-syntaxhighlight from 12 to 50
  • 08:32 ryankemper: Downtiming wdqs 20[09-12] until 2023-01-02 (these are new hosts not yet properly brought into service)
  • 00:10 eileen: + fc05361 Check for correct fin type name in hasEndowment

2022-12-20

  • 23:22 ryankemper: [WDQS] Powercycling `wdqs1005`
  • 23:20 eileen: evision changed from 6d8c98a7 to 52a513cb
  • 20:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:09 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:00 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 19:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 18:07 mutante: otrs1001 - upgraded clamav daemon package, manually removed run dir and pid file, stopped, started clamav daemon
  • 16:50 dancy@deploy1002: scap failed: NameError name 'SyslogFormatter' is not defined (duration: 00m 00s)
  • 16:49 dancy@deploy1002: scap failed: NameError name 'SyslogFormatter' is not defined (duration: 00m 00s)
  • 16:46 dancy@deploy1002: scap failed: NameError name 'logging' is not defined (duration: 00m 00s)
  • 16:19 dancy@deploy1002: backport aborted: (duration: 00m 01s)
  • 16:17 dancy@deploy1002: backport aborted: (duration: 00m 04s)
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet with OS bullseye
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 15:59 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1012.eqiad.wmnet with reason: host reimage
  • 15:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1012.eqiad.wmnet with reason: host reimage
  • 15:25 moritzm: installing jupyter-core security updates
  • 14:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1012.eqiad.wmnet with OS bullseye
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet with OS bullseye
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:29 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:19 moritzm: installing libpgjava security updates
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1011.eqiad.wmnet with reason: host reimage
  • 14:16 moritzm: installing jackson-databind security updates
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1011.eqiad.wmnet with reason: host reimage
  • 14:10 moritzm: installing ruby-rails-html-sanitizer security updates
  • 13:04 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:58 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:13 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1011.eqiad.wmnet with OS bullseye
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 11:16 moritzm: installing apache2 security updates on Buster
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
  • 10:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
  • 10:16 moritzm: rebalance ganeti cluster in ulsfo after adding new node and decom of the old hardware T317247
  • 10:06 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:05 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:45 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 04:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 03:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:02 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 01:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 00:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 00:38 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 00:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)

2022-12-19

  • 23:50 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
  • 23:32 ryankemper: [WDQS] Temporarily removing wdqs20[09-12] from pybal; these are new hosts that aren't ready for service until data reload has completed (long-running process). In meantime, remove these so they don't factor into pybal's depool threshold
  • 23:30 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
  • 23:30 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2003.codfw.wmnet with OS bullseye
  • 23:01 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:01 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:59 ryankemper: [WDQS] Continuing with reboot of WDQS hosts. Doing 1 host each of `[eqiad, codfw]` X `[internal, public]`, so 4 total hosts at once
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:56 ryankemper: [WDQS] Pooled `wdqs2005`
  • 22:43 ryankemper: [WDQS] Pooled `wdqs2007` (was depooled, we may have forgotten to re-pool it in the last week or so)
  • 22:38 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:34 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:24 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:23 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev[1004-1006].eqiad.wmnet
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 22:09 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 22:06 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:58 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev[1004-1006].eqiad.wmnet
  • 21:25 jhuneidi@deploy1002: Installation of scap version "4.30.3" completed for 563 hosts
  • 21:25 jhuneidi@deploy1002: Installing scap version "4.30.3" for 563 hosts
  • 20:42 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 20:42 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 20:39 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 20:38 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 20:32 eileen: civicrm upgraded from 98b48b9a to 3978bd6c
  • 19:33 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: host reimage
  • 19:30 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: host reimage
  • 19:03 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2003.codfw.wmnet with OS bullseye
  • 18:29 sukhe: running authdns-update for Gerrit: 869277: T188561
  • 16:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
  • 16:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
  • 16:44 moritzm: installing node-tar security updates
  • 16:40 elukey@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
  • 16:40 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 16:38 moritzm: installing node-json-schema security updates
  • 16:29 moritzm: installing virglrenderer security updates
  • 16:21 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 16:19 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
  • 15:54 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 15:54 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 15:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
  • 15:12 thcipriani@deploy1002: Finished scap: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537) (duration: 09m 31s)
  • 15:04 thcipriani@deploy1002: thcipriani and kemayo: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:02 thcipriani@deploy1002: Started scap: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537)
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2129 main traffic weight', diff saved to https://phabricator.wikimedia.org/P42725 and previous config saved to /var/cache/conftool/dbconfig/20221219-145357-marostegui.json
  • 14:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:37 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:34 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:34 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:33 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
  • 14:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1011-1015].eqiad.wnet
  • 14:20 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-presto[1011-1015].eqiad.wnet
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
  • 14:08 oblivian@deploy1002: Synchronized README: Null sync to force a redeployment of the php-fpm base image (duration: 13m 04s)
  • 14:06 moritzm: installing giflib security updates
  • 13:58 moritzm: installing glibc security updates
  • 13:42 _joe_: purge old docker images from deploy1002 by hand
  • 13:27 moritzm: installing PHP 7.3 security updates on buster
  • 13:19 phedenskog@deploy1002: Finished deploy [performance/navtiming@6aedc70]: (no justification provided) (duration: 00m 08s)
  • 13:19 phedenskog@deploy1002: Started deploy [performance/navtiming@6aedc70]: (no justification provided)
  • 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1006-1010].eqiad.wnet
  • 12:13 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five
  • 12:09 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five
  • 11:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test
  • 11:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test
  • 11:29 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance
  • 11:29 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 11:29 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 11:29 elukey@cumin1001: START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance
  • 11:27 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:51 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:36 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s)
  • 10:33 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d
  • 10:28 taavi@deploy1002: Finished scap: Backport for Only preload getPageData if there's thread data for the page (T325477) (duration: 07m 58s)
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0)
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 10:23 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 10:23 elukey@cumin1001: START - Cookbook sre.k8s.pool-depool-cluster
  • 10:21 taavi@deploy1002: taavi and taavi: Backport for Only preload getPageData if there's thread data for the page (T325477) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:20 taavi@deploy1002: Started scap: Backport for Only preload getPageData if there's thread data for the page (T325477)
  • 09:59 moritzm: update bullseye netboot image for Bullseye 11.6 point release T325186
  • 09:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s)
  • 09:48 aqu@deploy1002: Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269]
  • 09:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s)
  • 09:47 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269]
  • 09:30 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s)
  • 09:29 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff]
  • 09:29 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s)
  • 09:21 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff]
  • 09:20 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s)
  • 09:19 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff]
  • 09:17 aqu: About to deploy analytics/refinery (bug fix in HDFS usage pipeline)
  • 09:15 ladsgroup@deploy1002: Finished scap: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477) (duration: 09m 24s)
  • 09:14 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:07 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 09:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:05 ladsgroup@deploy1002: Started scap: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)
  • 09:04 dcausse: restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 09:03 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:02 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:56 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:33 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 29535
  • 08:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29535
  • 08:02 moritzm: installing openexr security updates
  • 07:22 phedenskog@deploy1002: Finished deploy [performance/navtiming@5770d46]: (no justification provided) (duration: 00m 08s)
  • 07:22 phedenskog@deploy1002: Started deploy [performance/navtiming@5770d46]: (no justification provided)
  • 07:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 28398
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 28398
  • 07:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11686
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11686
  • 03:08 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable wgDiscussionToolsABTest T325477 T321961 (duration: 15m 23s)

2022-12-18

  • 19:40 sukhe: ran sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm' [at 18:55 UTC]: T325477
  • 19:31 sukhe: [FINISHED] running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 19:31 sukhe: [FINISHED] running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 18:55 sukhe: running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 18:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 18:28 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 18:20 cgoubert@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 18:20 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers

2022-12-17

  • 14:36 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:30 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .

2022-12-16

  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bullseye
  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
  • 19:00 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
  • 18:44 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 18:41 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 18:19 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bullseye
  • 18:08 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4007']
  • 17:56 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4007']
  • 17:53 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4007']
  • 17:52 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4007']
  • 17:28 mutante: etherpad1003 - testing new: sudo systemctl start wmf_auto_restart_envoyproxy
  • 16:46 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4007.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4007.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:24 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gaenti4007 - robh@cumin2002"
  • 16:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gaenti4007 - robh@cumin2002"
  • 16:20 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
  • 16:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
  • 15:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@44d4e81]: Fix subtle bug on image_suggestions when resolving varprop. (duration: 00m 09s)
  • 15:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@44d4e81]: Fix subtle bug on image_suggestions when resolving varprop.
  • 13:48 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 12:32 jbond: deploy update to webproxy https://gerrit.wikimedia.org/r/c/operations/puppet/+/868372
  • 12:09 volans: run homer on cr[1-2]-{eqiad,codfw} to allow SSH from cloudcumin hosts to cloud hosts
  • 12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28398
  • 12:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28398
  • 11:19 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 11:16 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 10:51 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 09:44 moritzm: import terraform 1.3.6 to thirdparty/terraform for buster/bullseye T322344
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti5003.eqsin.wmnet
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 08:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5003.eqsin.wmnet
  • 08:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 08:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 08:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 08:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 08:35 moritzm: power down ganeti5003 manually (mgmt/IPMI broken) for pending decom T322048
  • 08:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2002.wikimedia.org
  • 08:18 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 00:51 mutante: puppetmasters - merged gerrit:868481 to "revert" gerrit:866644,ran puppet and 'systemctl reset-failed' via cumin on 10 masters, resolved monitoring alerts
  • 00:22 mutante: aphlict1001 - systemctl start man-db - T325246
  • 00:21 mutante: aphlict1001 - systemctl start logrotate - T325246
  • 00:21 mutante: aphlict1001 - :/var/log/aphlict# truncate aphlict.log --size 100M - T325246

2022-12-15

  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netmon1002.wikimedia.org
  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netmon1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 23:09 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netmon1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 23:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:00 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon1002.wikimedia.org
  • 22:49 TheresNoTime: `[samtar@mwmaint1002 imports]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Coffeeandcrumbs /home/samtar/imports` T325330
  • 22:48 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 22:48 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:47 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:42 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 22:11 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 22:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:10 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 21:59 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:59 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:52 TheresNoTime: done UTC late backport and config training
  • 21:47 samtar@deploy1002: Finished scap: Backport for NewImpact: Use "View all edits" in footer (T325216) (duration: 24m 38s)
  • 21:42 TheresNoTime: logging to note that this deploy is unusually slow, P42715
  • 21:36 samtar@deploy1002: samtar and tgr: Backport for NewImpact: Use "View all edits" in footer (T325216) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:25 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 21:24 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:23 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:22 samtar@deploy1002: Started scap: Backport for NewImpact: Use "View all edits" in footer (T325216)
  • 21:17 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 21:16 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:15 cwhite@cumin2002: conftool action : set/pooled=no; selector: name=logstash2024.codfw.wmnet,service=kibana7
  • 21:11 samtar@deploy1002: Finished scap: Backport for Update reference to CommandLineInc (T184782) (duration: 08m 18s)
  • 21:04 samtar@deploy1002: samtar and zabe: Backport for Update reference to CommandLineInc (T184782) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:03 samtar@deploy1002: Started scap: Backport for Update reference to CommandLineInc (T184782)
  • 20:22 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:21 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 20:20 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:19 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 20:02 denisse@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 19:35 mutante: short downtime for misc websites, iegreview, racktables, transparency.wm, annual.wm, design.wm, sitemaps.wm, research.wm, bienvenida.wm, wikiworkshop.org
  • 19:08 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@d23127b]: Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance. (duration: 00m 10s)
  • 19:07 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@d23127b]: Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance.
  • 18:50 robh: starting msw2-ulsfo swap for rack .23, mgmt will flap with no expected user impact
  • 18:48 denisse@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 18:37 denisse@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-codfw cluster: Reboot kafka nodes
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet with OS bullseye
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:07 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:05 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:04 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2002.codfw.wmnet with OS bullseye
  • 17:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1010.eqiad.wmnet with reason: host reimage
  • 17:48 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1010.eqiad.wmnet with reason: host reimage
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db2129', diff saved to https://phabricator.wikimedia.org/P42714 and previous config saved to /var/cache/conftool/dbconfig/20221215-174537-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2129', diff saved to https://phabricator.wikimedia.org/P42713 and previous config saved to /var/cache/conftool/dbconfig/20221215-173713-ladsgroup.json
  • 17:25 denisse@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: host reimage
  • 17:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb2002.codfw.wmnet with reason: reboot
  • 17:24 mutante: miscweb2002 - passive host, rebooting
  • 17:24 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb2002.codfw.wmnet with reason: reboot
  • 17:21 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: host reimage
  • 17:17 volans: manually removed /etc/spicerack/redis_cluster/sessions.yaml from the cumin hosts
  • 17:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on phab2002.codfw.wmnet with reason: reboot
  • 17:06 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on phab2002.codfw.wmnet with reason: reboot
  • 17:04 mutante: phab2002 (non active phabricator server) - rebooting
  • 16:58 mutante: aphlict1001 - aphlict service did not come back after rebooting machine. fix was to manually 'rm /var/run/aphlict/aphlict.pid' and 'systemtctl start aphlict'
  • 16:56 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
  • 16:54 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2002.codfw.wmnet with OS bullseye
  • 16:52 mutante: aphlict1001 - rebooting - this could mean for a minute there are dropped notifications about Phabricator tickets
  • 16:50 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
  • 16:45 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1010.eqiad.wmnet with OS bullseye
  • 16:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 01m 44s)
  • 16:42 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
  • 16:36 volans: upgrading spicerack to v6.0.0 on cumin1001
  • 16:34 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.0.0
  • 16:34 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.0.0
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 16:24 volans: upgrading spicerack to v6.0.0 on cumin2002
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 16:22 sukhe@cumin2002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM durum1001.eqiad.wmnet
  • 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
  • 16:17 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs2009.codfw.wmnet
  • 16:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 16:02 moritzm: installing glibc security updates on bullseye
  • 16:02 nemo-yiannis: switching maps/kartotherian back to codfw
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 15:57 moritzm: installing openexr security updates
  • 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 15:41 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
  • 15:37 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:37 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:34 volans: temporary disabling puppet on A:cumin-all to deploy spicerack v6.0.0
  • 15:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:24 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum1001.eqiad.wmnet
  • 15:22 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:15 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:15 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:15 reedy@deploy1002: Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Two backports (duration: 06m 57s)
  • 15:12 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:12 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:05 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:05 moritzm: imported prometheus-jmx-exporter for bookworm-wikimedia T321783
  • 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 14:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 14:44 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
  • 14:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1004.eqiad.wmnet
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1004.eqiad.wmnet
  • 14:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1003.eqiad.wmnet
  • 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318) (duration: 09m 18s)
  • 14:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:26 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 14:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:25 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 14:23 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1003.eqiad.wmnet
  • 14:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318)
  • 14:17 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412) (duration: 10m 57s)
  • 14:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and ki: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412)
  • 13:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 13:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 13:34 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:06 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1001.eqiad.wmnet with reason: host reimage
  • 12:48 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1001.eqiad.wmnet with reason: host reimage
  • 12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 12:20 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 12:19 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kartotherian,name=maps1010.eqiad.wmnet
  • 12:19 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kartotherian-ssl,name=maps1010.eqiad.wmnet
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
  • 12:08 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@00c9a16] (eqiad): codfw: Disable traffic mirroring (duration: 01m 00s)
  • 12:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@00c9a16] (eqiad): codfw: Disable traffic mirroring
  • 12:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 11:43 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 11:42 effie: switching maps/kartotherian from codfw to eqiad
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 11:39 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@00c9a16] (codfw): codfw: Disable traffic mirroring (duration: 01m 43s)
  • 11:37 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@00c9a16] (codfw): codfw: Disable traffic mirroring
  • 11:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flowspec1001.eqiad.wmnet
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host flowspec1001.eqiad.wmnet
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3002.esams.wmnet
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping3002.esams.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2002.codfw.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2002.codfw.wmnet
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1002.eqiad.wmnet
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1002.eqiad.wmnet
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
  • 10:47 XioNoX: disable ping offload in eqiad
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
  • 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 10:34 jayme: restarted istiod pods in aux-k8s because of T303184
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 09:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:54 effie: stopping and masking nutcracker on mw servers - T277183
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:51 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2001.codfw.wmnet
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
  • 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
  • 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:30 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host acmechief-test1001.eqiad.wmnet
  • 09:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:27 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
  • 09:12 akosiaris: reboot rdb2007 for kernel upgrades
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
  • 09:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.14 refs T320519
  • 08:53 akosiaris: reboot rdb2009 for kernel upgrades
  • 08:52 akosiaris: correction: reboot rdb1011 for kernel upgrades
  • 08:51 akosiaris: reboot rdb1007 for kernel upgrades
  • 08:51 akosiaris: nothing noticed with rdb1007 reboot for mw, jobqueue, api-gateway. changeprop had a minor backlog increase, but everything appears fine now.
  • 08:28 akosiaris: reboot rdb1009 for kernel upgrades. possibly (but probably not) affected applications: changeprop, cpjobqueue, api-gateway, redisLockManager
  • 08:14 kartik@deploy1002: Finished scap: Backport for Enable Section Translation on 6 WPs (T319177) (duration: 10m 55s)
  • 08:08 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet
  • 08:04 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation on 6 WPs (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation on 6 WPs (T319177)
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 01:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:58 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage
  • 00:55 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage
  • 00:32 mutante: releases1002 - rebooting
  • 00:30 mutante: releases2002 - rebooting
  • 00:19 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 00:19 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:15 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 00:14 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:05 tgr: EU late backports done
  • 00:05 tgr@deploy1002: Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Backport: User impact: read edit count from primary db in save complete hook (T324930) (duration: 07m 03s)

2022-12-14

  • 23:50 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 23:48 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']
  • 23:44 ejegg: civicrm upgraded from a1c2630a to 98b48b9a
  • 23:41 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']
  • 23:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']
  • 23:33 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']
  • 23:29 ryankemper: [WDQS] Downtimed wdqs20[09-12] for the next 7 days
  • 23:28 ryankemper: T301167 wdqs2011/2012 were not visible in pybal (oversight from when I added the other hosts with conftool last week). Fixed that, so now all of the new hosts are showing up properly.
  • 23:27 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2012.*
  • 23:27 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2011.*
  • 23:14 bd808: Toolhub: rebuilding search indices following app update
  • 23:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 23:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on alert2001.wikimedia.org with reason: kernel update
  • 23:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 23:10 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on alert2001.wikimedia.org with reason: kernel update
  • 23:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 23:03 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org
  • 23:03 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org
  • 23:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 23:01 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 22:59 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 22:56 tgr: doing the last backport by hand due to T325252
  • 22:49 tgr@deploy1002: Finished scap: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp (duration: 11m 37s)
  • 22:46 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
  • 22:39 tgr@deploy1002: tgr and kharlan and tgr: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:37 tgr@deploy1002: Started scap: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp
  • 22:36 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting
  • 22:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting
  • 22:32 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 22:12 samtar@deploy1002: Finished scap: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537) (duration: 08m 12s)
  • 22:06 samtar@deploy1002: samtar and kemayo: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:04 samtar@deploy1002: Started scap: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537)
  • 22:03 samtar@deploy1002: Finished scap: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537) (duration: 08m 55s)
  • 21:56 samtar@deploy1002: samtar and kemayo: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:55 eileen: civicrm upgraded from a65bc00a to a1c2630a
  • 21:54 samtar@deploy1002: Started scap: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537)
  • 21:53 samtar@deploy1002: Finished scap: Backport for Parsoid: don't bypass ParserCache when using Title (duration: 11m 13s)
  • 21:44 samtar@deploy1002: samtar and daniel: Backport for Parsoid: don't bypass ParserCache when using Title synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:42 samtar@deploy1002: Started scap: Backport for Parsoid: don't bypass ParserCache when using Title
  • 21:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 21:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 21:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 21:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 21:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 21:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 21:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 21:17 samtar@deploy1002: backport aborted: (duration: 15m 35s)
  • 21:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 21:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 20:59 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 20:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 20:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 20:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:11 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash1036.eqiad.wmnet with OS buster
  • 19:30 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2001.codfw.wmnet with OS bullseye
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
  • 19:21 mutante: aphlict1001 - :/var/log/aphlict# gzip aphlict.log.1
  • 19:19 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash1037.eqiad.wmnet with OS buster
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
  • 19:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2027.codfw.wmnet with OS bullseye
  • 18:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2027.codfw.wmnet with reason: host reimage
  • 18:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2027.codfw.wmnet with reason: host reimage
  • 18:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: host reimage
  • 18:37 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: host reimage
  • 18:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2027.codfw.wmnet with OS bullseye
  • 18:22 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2027']
  • 18:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2027']
  • 18:12 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2027']
  • 18:09 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2001.codfw.wmnet with OS bullseye
  • 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 18:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2027']
  • 18:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 18:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2029.codfw.wmnet with OS bullseye
  • 18:01 volans: uploaded spicerack_6.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 17:45 effie: disable puppet on all P:mediawiki::nutcracker hosts (killing nutcracker on mw)
  • 17:41 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2029.codfw.wmnet with reason: host reimage
  • 17:38 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2029.codfw.wmnet with reason: host reimage
  • 17:34 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2028.codfw.wmnet with OS bullseye
  • 17:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 17:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 17:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 17:25 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6001.drmrs.wmnet
  • 17:22 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 17:22 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 17:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 17:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 17:19 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2029.codfw.wmnet with OS bullseye
  • 17:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes1011.eqiad.wmnet
  • 17:18 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus6001.drmrs.wmnet
  • 17:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2029']
  • 17:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 17:15 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5001.eqsin.wmnet
  • 17:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 17:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1011.eqiad.wmnet
  • 17:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 17:12 mutante: https://doc.wikimedia.org - maybe a few seconds of downtime
  • 17:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add aux-k8s-ingress VIP - cgoubert@cumin1001"
  • 17:11 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2028.codfw.wmnet with reason: host reimage
  • 17:11 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2029']
  • 17:11 mutante: doc2001 - rebooting
  • 17:10 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add aux-k8s-ingress VIP - cgoubert@cumin1001"
  • 17:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
  • 17:09 mutante: planet1002 - rebooting
  • 17:09 mutante: planet2002 - rebooting
  • 17:09 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2029']
  • 17:08 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5001.eqsin.wmnet
  • 17:08 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2028.codfw.wmnet with reason: host reimage
  • 17:08 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 17:07 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 17:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 17:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 17:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
  • 17:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 17:03 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4001.ulsfo.wmnet
  • 17:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
  • 17:01 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2029']
  • 16:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
  • 16:58 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 16:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1036.eqiad.wmnet with OS buster
  • 16:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:57 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4001.ulsfo.wmnet
  • 16:56 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 16:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
  • 16:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 16:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 16:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 16:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
  • 16:50 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.14 refs T320519 (duration: 07m 06s)
  • 16:49 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 16:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 16:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 16:48 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2028.codfw.wmnet with OS bullseye
  • 16:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
  • 16:46 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:43 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2028']
  • 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 16:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host orespoolcounter1003.eqiad.wmnet
  • 16:43 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.14 refs T320519
  • 16:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 16:41 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 16:41 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 16:40 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be1001.eqiad.wmnet
  • 16:36 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2028']
  • 16:36 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be1001.eqiad.wmnet
  • 16:35 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2028']
  • 16:33 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
  • 16:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2028']
  • 16:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:25 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
  • 16:24 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
  • 16:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:19 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
  • 16:17 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:17 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:15 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
  • 16:10 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
  • 16:06 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
  • 16:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:51 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 15:50 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 15:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 15:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2003.codfw.wmnet
  • 15:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2003.codfw.wmnet
  • 15:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2001.codfw.wmnet
  • 15:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2001.codfw.wmnet
  • 15:08 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 15:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 14:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui2001.codfw.wmnet
  • 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui2001.codfw.wmnet
  • 14:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P42707 and previous config saved to /var/cache/conftool/dbconfig/20221214-142958-root.json
  • 14:20 ladsgroup@deploy1002: Finished scap: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137) (duration: 08m 12s)
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42706 and previous config saved to /var/cache/conftool/dbconfig/20221214-141644-root.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P42705 and previous config saved to /var/cache/conftool/dbconfig/20221214-141453-root.json
  • 14:13 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:13 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 14:11 ladsgroup@deploy1002: Started scap: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137)
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs1003.eqiad.wmnet
  • 14:11 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs1003.eqiad.wmnet
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42704 and previous config saved to /var/cache/conftool/dbconfig/20221214-140139-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P42703 and previous config saved to /var/cache/conftool/dbconfig/20221214-135948-root.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42702 and previous config saved to /var/cache/conftool/dbconfig/20221214-134634-root.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P42701 and previous config saved to /var/cache/conftool/dbconfig/20221214-134443-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42700 and previous config saved to /var/cache/conftool/dbconfig/20221214-133129-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P42699 and previous config saved to /var/cache/conftool/dbconfig/20221214-132938-root.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42697 and previous config saved to /var/cache/conftool/dbconfig/20221214-131624-root.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P42696 and previous config saved to /var/cache/conftool/dbconfig/20221214-131433-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42695 and previous config saved to /var/cache/conftool/dbconfig/20221214-130119-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 T325154', diff saved to https://phabricator.wikimedia.org/P42694 and previous config saved to /var/cache/conftool/dbconfig/20221214-125950-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P42693 and previous config saved to /var/cache/conftool/dbconfig/20221214-125928-root.json
  • 12:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T325154', diff saved to https://phabricator.wikimedia.org/P42692 and previous config saved to /var/cache/conftool/dbconfig/20221214-125544-marostegui.json
  • 12:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:38 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662) (duration: 29m 18s)
  • 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
  • 12:29 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1001.eqiad.wmnet
  • 12:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:27 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host pki2001.codfw.wmnet
  • 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb1002.eqiad.wmnet
  • 12:20 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
  • 12:19 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
  • 12:18 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1001.eqiad.wmnet
  • 12:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
  • 12:11 jbond: disable puppet fleet wide to preform server reboots
  • 12:09 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662)
  • 11:59 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
  • 11:55 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 11:49 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 11:46 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 11:42 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 11:42 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 11:38 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 11:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 11:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 11:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 11:19 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ml-cache2002.codfw.wmnet
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 11:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 11:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 11:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
  • 11:03 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 11:00 moritzm: installing dpkg bugfix updates from Bullseye point release
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
  • 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
  • 10:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 10:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
  • 10:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
  • 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6001.wikimedia.org
  • 10:31 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 10:31 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 10:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6001.wikimedia.org
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 10:24 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2003.codfw.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2002.codfw.wmnet
  • 10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2002.codfw.wmnet
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
  • 10:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
  • 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
  • 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 09:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2001.codfw.wmnet
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 09:53 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 09:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 09:46 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.14/includes/search/SearchResultThumbnailProvider.php: Backport: search: Avoid setting height in search thumbnails (T322621) (duration: 08m 07s)
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
  • 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
  • 09:28 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6001.wikimedia.org
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 09:25 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6001.wikimedia.org
  • 09:20 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:41 hashar: Restarted Gerrit for a plugin update
  • 08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068 (duration: 00m 09s)
  • 08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068
  • 08:39 aqu@deploy1002: Finished deploy [airflow-dags/analytics@353573b]: HDFS usage dataset pipeline deployment without superuser [airflow-dags@353573b] (duration: 00m 13s)
  • 08:39 aqu@deploy1002: Started deploy [airflow-dags/analytics@353573b]: HDFS usage dataset pipeline deployment without superuser [airflow-dags@353573b]
  • 08:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@353573b]: HDFS usage dataset pipeline deployment without superuser TEST [airflow-dags@353573b] (duration: 00m 10s)
  • 08:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@353573b]: HDFS usage dataset pipeline deployment without superuser TEST [airflow-dags@353573b]
  • 08:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068 (duration: 00m 11s)
  • 08:25 hashar@deploy1002: Started deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068
  • 07:28 phedenskog@deploy1002: Finished deploy [performance/navtiming@7ba179f]: (no justification provided) (duration: 00m 08s)
  • 07:28 phedenskog@deploy1002: Started deploy [performance/navtiming@7ba179f]: (no justification provided)
  • 01:22 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS bullseye
  • 01:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS bullseye
  • 01:06 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 01:04 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 01:02 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 01:01 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 00:46 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS bullseye
  • 00:45 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS bullseye

2022-12-13

  • 22:57 jdrewniak@deploy1002: Finished scap: Backport for Account for syntax errors in closest selector (T325113) (duration: 08m 29s)
  • 22:50 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for Account for syntax errors in closest selector (T325113) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 22:48 jdrewniak@deploy1002: Started scap: Backport for Account for syntax errors in closest selector (T325113)
  • 22:42 jdrewniak@deploy1002: Finished scap: Backport for Account for syntax errors in closest selector (T325113) (duration: 09m 20s)
  • 22:40 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "parse1002: failed -> active - dzahn@cumin2002"
  • 22:40 mutante: netbox: set parse1002 status: failed -> active in web UI; ran cookbook 'sre.puppet.sync-netbox-hiera' to get data in sync - T324949
  • 22:38 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "parse1002: failed -> active - dzahn@cumin2002"
  • 22:35 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for Account for syntax errors in closest selector (T325113) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:34 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:33 jdrewniak@deploy1002: Started scap: Backport for Account for syntax errors in closest selector (T325113)
  • 22:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2035.codfw.wmnet with OS bullseye
  • 22:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2033.codfw.wmnet with OS bullseye
  • 22:12 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2034.codfw.wmnet with OS bullseye
  • 22:10 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1] (hadoop-test): HDFS FSImage conversion to XML script TEST [analytics/refinery@66736e1] (duration: 01m 11s)
  • 22:09 aqu@deploy1002: Started deploy [analytics/refinery@66736e1] (hadoop-test): HDFS FSImage conversion to XML script TEST [analytics/refinery@66736e1]
  • 22:08 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1] (thin): HDFS FSImage conversion to XML script THIN [analytics/refinery@66736e1] (duration: 00m 07s)
  • 22:08 aqu@deploy1002: Started deploy [analytics/refinery@66736e1] (thin): HDFS FSImage conversion to XML script THIN [analytics/refinery@66736e1]
  • 22:06 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1]: HDFS FSImage conversion to XML script [analytics/refinery@66736e1] (duration: 26m 32s)
  • 21:53 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 21:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: extension distributor updates (duration: 06m 50s)
  • 21:50 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 21:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 21:47 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 21:43 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 21:41 kindrobot: Finishing UTC late backport window
  • 21:40 kindrobot@deploy1002: Finished scap: Backport for Start writing to cul_actor everywhere (T233004) (duration: 18m 47s)
  • 21:40 aqu@deploy1002: Started deploy [analytics/refinery@66736e1]: HDFS FSImage conversion to XML script [analytics/refinery@66736e1]
  • 21:39 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 21:35 aqu: Deploying analytics/refinery (HDFS FSImage conversion to XML script)
  • 21:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2033.codfw.wmnet with OS bullseye
  • 21:30 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2034.codfw.wmnet with OS bullseye
  • 21:23 kindrobot@deploy1002: kindrobot and zabe: Backport for Start writing to cul_actor everywhere (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2035.codfw.wmnet with OS bullseye
  • 21:21 kindrobot@deploy1002: Started scap: Backport for Start writing to cul_actor everywhere (T233004)
  • 21:17 samtar@deploy1002: Finished scap: Backport for Child elements also trigger previews (T325007) (duration: 09m 38s)
  • 21:09 samtar@deploy1002: samtar and jdlrobson: Backport for Child elements also trigger previews (T325007) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:08 samtar@deploy1002: Started scap: Backport for Child elements also trigger previews (T325007)
  • 20:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling restart
  • 20:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling restart
  • 20:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint1001.wikimedia.org
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 19:00 ladsgroup@deploy1002: Finished scap: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test (duration: 07m 53s)
  • 18:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:54 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:52 ladsgroup@deploy1002: Started scap: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test
  • 18:51 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts contint1001.wikimedia.org
  • 18:51 mutante: decom'ing contint1001 (formerly prod CI) server, replaced by contint1002 T324698
  • 18:47 ladsgroup@deploy1002: Finished scap: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys (duration: 09m 25s)
  • 18:40 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 18:38 ladsgroup@deploy1002: Started scap: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys
  • 18:21 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:26 btullis: edited automation/proxies/ttyS1-115200.conf to remove `include "/etc/dhcp/automation/ttyS1-115200/kafka-stretch1001.conf";`and restarted isc-dhc-server
  • 17:22 btullis: btullis@install1003:/etc/dhcp/automation/ttyS1-115200$ sudo systemctl restart isc-dhcp-server.service T314156
  • 16:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:17 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 16:15 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 16:12 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 16:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[01234].eqiad.wmnet
  • 16:03 moritzm: installing ruby-tzinfo security updates
  • 16:01 hnowlan@puppetmaster1001: conftool action : set/weight=4:pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
  • 15:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20804
  • 15:54 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes101[234].eqiad.wmnet
  • 15:51 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 15:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20804
  • 15:27 derick@deploy1002: Finished scap: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034) (duration: 08m 59s)
  • 15:20 derick@deploy1002: derick and func: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 15:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 15:18 derick@deploy1002: Started scap: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034)
  • 15:02 derick@deploy1002: Finished scap: Backport for Log linter data while parsing full pages (T246403) (duration: 10m 28s)
  • 14:53 derick@deploy1002: derick and arlolra: Backport for Log linter data while parsing full pages (T246403) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:52 derick@deploy1002: Started scap: Backport for Log linter data while parsing full pages (T246403)
  • 14:50 derick@deploy1002: backport aborted: (duration: 07m 09s)
  • 14:41 derick@deploy1002: Finished scap: Backport for hewiki: set VisualEditor to direct mode (T320529) (duration: 14m 34s)
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics
  • 14:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics
  • 14:28 derick@deploy1002: derick and daniel: Backport for hewiki: set VisualEditor to direct mode (T320529) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:26 derick@deploy1002: Started scap: Backport for hewiki: set VisualEditor to direct mode (T320529)
  • 14:22 moritzm: added smunene to pwstore
  • 14:17 derick@deploy1002: Backport cancelled.
  • 14:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001
  • 14:03 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001
  • 13:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 13:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:49 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 13:48 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002
  • 13:28 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002
  • 12:31 claime: sessionstore outage being monitored
  • 12:23 claime: sessionstore outage, login functions severely impacted
  • 12:07 hashar: Gerrit now has CI job results represented in the Checks tab which should be a little nicer. The old HTML result table is gone and replaced by little bubbles representing the state of the builds for the latest patchset. Ref: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/3ULF5NPVC4MSVABZBSXAMDODLZUKFXHS/
  • 12:00 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart
  • 12:00 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart
  • 11:57 hashar: Restarted Gerrit on gerrit1001
  • 11:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 09s)
  • 11:55 hashar@deploy1002: Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068
  • 11:54 hashar: Restarted Gerrit on gerrit2002 (replica)
  • 11:52 hashar@deploy1002: Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 11s)
  • 11:52 hashar@deploy1002: Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:22 moritzm: installing paramiko security updates#
  • 11:17 claime: Puppet re-enabled on cp::text nodes - T290536
  • 10:58 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100% (duration: 01m 40s)
  • 10:57 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100%
  • 10:54 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics (duration: 02m 25s)
  • 10:51 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics
  • 10:36 vgutierrez: clean up stale prometheus target files in prometheus5001
  • 10:22 claime: puppet run on cp4037 - T290536
  • 10:21 claime: puppet disabled on cp hosts for T290536
  • 10:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:54 moritzm: installing libhttp-daemon-perl security updates
  • 09:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.14 refs T320519
  • 09:07 claime: Repooled parse1002.eqiad.wmnet in parsoid service - T324949
  • 09:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 09:05 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 09:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=parse1002.eqiad.wmnet
  • 08:58 moritzm: installing libpgjava security updates
  • 08:55 moritzm: installing xen security updates
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42683 and previous config saved to /var/cache/conftool/dbconfig/20221213-083019-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42682 and previous config saved to /var/cache/conftool/dbconfig/20221213-081514-root.json
  • 08:13 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in Chuvash Wikipedia (T319176) (duration: 10m 01s)
  • 08:05 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation in Chuvash Wikipedia (T319176) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation in Chuvash Wikipedia (T319176)
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42681 and previous config saved to /var/cache/conftool/dbconfig/20221213-080009-root.json
  • 07:52 ladsgroup@deploy1002: Finished scap: Backport for Reduce PC writes from parsoid API to 1% (duration: 09m 35s)
  • 07:45 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Reduce PC writes from parsoid API to 1% synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42680 and previous config saved to /var/cache/conftool/dbconfig/20221213-074504-root.json
  • 07:43 ladsgroup@deploy1002: Started scap: Backport for Reduce PC writes from parsoid API to 1%
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42679 and previous config saved to /var/cache/conftool/dbconfig/20221213-072959-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42678 and previous config saved to /var/cache/conftool/dbconfig/20221213-071454-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42677 and previous config saved to /var/cache/conftool/dbconfig/20221213-065949-root.json
  • 06:59 kart_: Updated cxserver to 2022-12-06-121330-production (T321781, T324534)
  • 06:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:54 marostegui: Reboot db1206 to test RAID controller
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42676 and previous config saved to /var/cache/conftool/dbconfig/20221213-065402-marostegui.json
  • 06:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:51 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:50 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:23 dwisehaupt: payments now using the new digicert certificate in eqiad
  • 04:56 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.12 (duration: 02m 15s)
  • 04:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.14 refs T320519 (duration: 52m 11s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.14 refs T320519

2022-12-12

  • 23:35 tzatziki: removing 3 files for legal compliance
  • 23:12 tzatziki: removing 2 files for legal compliance
  • 22:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 21:57 tzatziki: removing 1 file for legal compliance
  • 21:13 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 21:08 cstone: civicrm upgraded from 3ae68ab4 to 09925be0
  • 20:31 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin -b 4 wdqs2* 'systemctl restart wdqs-blazegraph'`
  • 17:59 ladsgroup@deploy1002: Finished scap: Backport for Disable writing parsoid html to PC on commons and wikidata. (duration: 07m 45s)
  • 17:53 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Disable writing parsoid html to PC on commons and wikidata. synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 17:52 ladsgroup@deploy1002: Started scap: Backport for Disable writing parsoid html to PC on commons and wikidata.
  • 16:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1004.eqiad.wmnet with OS bullseye
  • 16:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 16:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 16:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 16:04 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 16:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 16:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:01 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:59 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 15:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:46 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:36 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:09 krinkle@deploy1002: Finished scap: Backport for Add Largest Contentful Paint (LCP) (T319329) (duration: 08m 51s)
  • 15:06 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:02 krinkle@deploy1002: krinkle and krinkle: Backport for Add Largest Contentful Paint (LCP) (T319329) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:00 krinkle@deploy1002: Started scap: Backport for Add Largest Contentful Paint (LCP) (T319329)
  • 14:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 21804
  • 14:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21804
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 21320
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21320
  • 14:27 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Wikidata: don't show Vector search thumbnails (T316093) (duration: 09m 47s)
  • 14:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209275
  • 14:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209275
  • 14:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38757
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38757
  • 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20804
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20804
  • 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21804
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21804
  • 14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Wikidata: don't show Vector search thumbnails (T316093) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62887
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62887
  • 14:18 moritzm: rebalance Ganeti cluster in eqsin after adding ganeti5005-5007 T324610
  • 14:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11686
  • 14:17 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Wikidata: don't show Vector search thumbnails (T316093)
  • 14:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11686
  • 14:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add event stream config for ios.talk_page_interaction (T324340) (duration: 08m 42s)
  • 14:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a2ebe75] (codfw): Increase codfw mirrored traffic to 75% (duration: 01m 41s)
  • 14:12 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a2ebe75] (codfw): Increase codfw mirrored traffic to 75%
  • 14:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and tsev: Backport for Add event stream config for ios.talk_page_interaction (T324340) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add event stream config for ios.talk_page_interaction (T324340)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove for eventual decom
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove for eventual decom
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1004.eqiad.wmnet with reason: host reimage
  • 13:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1004.eqiad.wmnet with reason: host reimage
  • 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1004.eqiad.wmnet with OS bullseye
  • 12:49 moritzm: installing jackson-databind security updates
  • 12:48 moritzm: installing Django security updates
  • 12:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1003.eqiad.wmnet with OS bullseye
  • 12:19 moritzm: installing jqueryui security updates
  • 12:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1003.eqiad.wmnet with reason: host reimage
  • 12:13 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1003.eqiad.wmnet with reason: host reimage
  • 12:04 moritzm: installing twisted security updates
  • 11:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1003.eqiad.wmnet with OS bullseye
  • 11:49 moritzm: drain ganeti5003 for eventual decom T322048
  • 11:43 moritzm: failover Ganeti master in eqsin to ganeti5004 (5003 will be decommissioned) T322048
  • 11:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Bad CPU
  • 11:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Bad CPU
  • 11:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 11:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1002.eqiad.wmnet with OS bullseye
  • 10:52 claime: Switched parse1002 to parse1003 in parsoid-canary - T324949
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti5007.eqsin.wmnet
  • 10:47 ladsgroup@deploy1002: Finished scap: Backport for Bump portals to HEAD (duration: 09m 48s)
  • 10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1002.eqiad.wmnet with reason: host reimage
  • 10:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Bump portals to HEAD synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 10:39 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1002.eqiad.wmnet with reason: host reimage
  • 10:37 ladsgroup@deploy1002: Started scap: Backport for Bump portals to HEAD
  • 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:24 slyngs: update add-ldap-group tool, https://gerrit.wikimedia.org/r/c/operations/puppet/+/860568
  • 10:18 slyngs: Update modify-mfa tools, https://gerrit.wikimedia.org/r/c/operations/puppet/+/861385
  • 10:17 claime: depooled parse1002.eqiad.wmnet for hw failure - T324949
  • 10:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@bdc19a3] (codfw): Increase codfw mirrored traffic to 50% (duration: 01m 42s)
  • 10:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@bdc19a3] (codfw): Increase codfw mirrored traffic to 50%
  • 09:58 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
  • 09:06 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6af0d2d] (codfw): Increase codfw mirrored traffic to 25% (duration: 02m 15s)
  • 09:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6af0d2d] (codfw): Increase codfw mirrored traffic to 25%
  • 08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1002.eqiad.wmnet with OS bullseye
  • 08:25 matthiasmullie: UTC morning backports done
  • 08:24 mlitn@deploy1002: Finished scap: Backport for Add mediawiki.searchpreview schema (T321069) (duration: 18m 21s)
  • 08:16 mlitn@deploy1002: mlitn and mlitn: Backport for Add mediawiki.searchpreview schema (T321069) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:08 XioNoX: remove bast5001 from management routers ACLs (replaced by bast5002)
  • 08:06 mlitn@deploy1002: Started scap: Backport for Add mediawiki.searchpreview schema (T321069)
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42672 and previous config saved to /var/cache/conftool/dbconfig/20221212-074700-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42671 and previous config saved to /var/cache/conftool/dbconfig/20221212-073155-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42670 and previous config saved to /var/cache/conftool/dbconfig/20221212-071650-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42669 and previous config saved to /var/cache/conftool/dbconfig/20221212-070145-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42668 and previous config saved to /var/cache/conftool/dbconfig/20221212-064640-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42667 and previous config saved to /var/cache/conftool/dbconfig/20221212-063135-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42666 and previous config saved to /var/cache/conftool/dbconfig/20221212-061630-root.json

2022-12-10

  • 03:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 02:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 49 hosts with reason: Plugin upgrade for T322776
  • 02:00 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 49 hosts with reason: Plugin upgrade for T322776
  • 01:59 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 01:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776

2022-12-09

  • 23:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 22:27 cstone: payments-wiki upgraded from 35555b67 to 77297c12
  • 19:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:34 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:34 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 18:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:16 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk 2!
  • 18:16 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk 2!
  • 18:04 jnuche@deploy1002: Installation of scap version "4.30.2" completed for 562 hosts
  • 18:04 jnuche@deploy1002: Installing scap version "4.30.2" for 562 hosts
  • 17:59 jnuche@deploy1002: Installing scap version "4.30.2" for 563 hosts
  • 17:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk
  • 17:44 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk
  • 17:03 claime: eventgate-analytics bumped to 30 replicas to absorb increased load - T320518
  • 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:57 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 16:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 16:42 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:35 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 16:34 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:33 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 16:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42662 and previous config saved to /var/cache/conftool/dbconfig/20221209-134806-marostegui.json
  • 13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1001.eqiad.wmnet with OS bullseye
  • 13:13 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.13 refs T320518
  • 13:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1001.eqiad.wmnet with reason: host reimage
  • 13:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1001.eqiad.wmnet with reason: host reimage
  • 12:36 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1001.eqiad.wmnet with OS bullseye
  • 12:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6b70e03] (codfw): Reduce mirrored traffic to 5% (duration: 01m 39s)
  • 12:12 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6b70e03] (codfw): Reduce mirrored traffic to 5%
  • 11:58 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 11:58 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 11:41 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc] (codfw): Enable geopoints on production (duration: 01m 00s)
  • 11:40 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc] (codfw): Enable geopoints on production
  • 11:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2004.codfw.wmnet with OS bullseye
  • 11:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2004.codfw.wmnet with reason: host reimage
  • 11:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2004.codfw.wmnet with reason: host reimage
  • 11:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Decomissioning netmon2001 - cgoubert@cumin1001"
  • 11:09 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@17b9319] (codfw): codfw: Enable mirroring for 25% of the traffic (duration: 05m 08s)
  • 11:06 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Decomissioning netmon2001 - cgoubert@cumin1001"
  • 11:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@17b9319] (codfw): codfw: Enable mirroring for 25% of the traffic
  • 11:02 ladsgroup@deploy1002: Finished scap: Backport for Followup to 5cb38845: Don't drop revid info (T324801) (duration: 12m 59s)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2004.codfw.wmnet with OS bullseye
  • 10:51 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Followup to 5cb38845: Don't drop revid info (T324801) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:49 ladsgroup@deploy1002: Started scap: Backport for Followup to 5cb38845: Don't drop revid info (T324801)
  • 10:36 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 10:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2003.codfw.wmnet with OS bullseye
  • 09:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2003.codfw.wmnet with reason: host reimage
  • 09:51 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2003.codfw.wmnet with reason: host reimage
  • 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2003.codfw.wmnet with OS bullseye
  • 08:39 marostegui: dbmaint schema change on s8@eqiad T324797
  • 08:39 marostegui: dbmaint schema change on s7@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s6@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s5@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s4@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s2@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s1@eqiad T324797
  • 08:35 marostegui: dbmaint schema change on s3@eqiad T324797
  • 08:02 marostegui: dbmaint schema change on s3 T324797
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42661 and previous config saved to /var/cache/conftool/dbconfig/20221209-075057-root.json
  • 07:36 marostegui: dbmaint schema change on s5 T324797
  • 07:36 marostegui: dbmaint schema change on s1 T324797
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42660 and previous config saved to /var/cache/conftool/dbconfig/20221209-073552-root.json
  • 07:29 marostegui: dbmaint schema change on s6 T324797
  • 07:29 marostegui: dbmaint schema change on s8 T324797
  • 07:29 marostegui: dbmaint schema change on s7 T324797
  • 07:29 marostegui: dbmaint schema change on s4 T324797
  • 07:29 marostegui: dbmaint schema change on s2 T324797
  • 07:28 marostegui: Deploy schema change on s2 T324797
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42659 and previous config saved to /var/cache/conftool/dbconfig/20221209-072047-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42658 and previous config saved to /var/cache/conftool/dbconfig/20221209-070542-root.json
  • 07:00 marostegui: Deploy schema change on s4 T324797
  • 06:58 marostegui: Deploy schema change on s7 T324797
  • 06:57 marostegui: Deploy schema change on s8 T324797
  • 06:55 marostegui: Deploy schema change on s6 T324797
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42657 and previous config saved to /var/cache/conftool/dbconfig/20221209-065037-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42656 and previous config saved to /var/cache/conftool/dbconfig/20221209-063532-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42655 and previous config saved to /var/cache/conftool/dbconfig/20221209-062027-root.json
  • 05:28 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:03 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 04:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 03:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 50 hosts with reason: Rolling restart in progress
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 50 hosts with reason: Rolling restart in progress
  • 03:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 02:39 cwhite@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.40.0-wmf.13"
  • 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS buster
  • 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 02:23 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 02:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 02:08 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 01:49 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS buster
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS buster
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 01:47 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 01:30 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster
  • 01:07 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2003
  • 01:06 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2003
  • 01:06 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2002
  • 01:05 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2002
  • 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
  • 01:04 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
  • 01:01 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2003.codfw.wmnet
  • 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:45 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:43 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:39 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2003.codfw.wmnet
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2002.codfw.wmnet
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:37 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:34 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:30 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2002.codfw.wmnet

2022-12-08

  • 23:32 ladsgroup@deploy1002: Finished scap: Backport for Make wikibase.client.init module target mobile (T235712) (duration: 08m 42s)
  • 23:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Make wikibase.client.init module target mobile (T235712) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 23:23 ladsgroup@deploy1002: Started scap: Backport for Make wikibase.client.init module target mobile (T235712)
  • 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS buster
  • 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 23:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 23:11 ladsgroup@deploy1002: Finished scap: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518) (duration: 09m 12s)
  • 23:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:02 ladsgroup@deploy1002: Started scap: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)
  • 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 22:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS buster
  • 22:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 22:28 TheresNoTime: close UTC late backport and config training (+28m)
  • 22:27 samtar@deploy1002: Finished scap: Backport for Start mobile DiscussionTools A/B test (T321961) (duration: 09m 57s)
  • 22:19 samtar@deploy1002: samtar and matmarex: Backport for Start mobile DiscussionTools A/B test (T321961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 22:17 samtar@deploy1002: Started scap: Backport for Start mobile DiscussionTools A/B test (T321961)
  • 22:16 samtar@deploy1002: Finished scap: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686) (duration: 10m 06s)
  • 22:08 samtar@deploy1002: samtar and matmarex: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:06 samtar@deploy1002: Started scap: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686)
  • 22:05 samtar@deploy1002: Finished scap: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766) (duration: 11m 04s)
  • 22:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 21:56 samtar@deploy1002: samtar and stang: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:54 samtar@deploy1002: Started scap: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766)
  • 21:53 samtar@deploy1002: Finished scap: Backport for specieswiki: Install GeoData extension (T324348) (duration: 09m 16s)
  • 21:46 samtar@deploy1002: samtar and stang: Backport for specieswiki: Install GeoData extension (T324348) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:44 samtar@deploy1002: Started scap: Backport for specieswiki: Install GeoData extension (T324348)
  • 21:39 TheresNoTime: T324348 : `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php specieswiki geodata`
  • 21:37 samtar@deploy1002: Finished scap: Backport for createExtensionTables: Add extension GeoData (T324348) (duration: 08m 01s)
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 21:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 21:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 21:31 samtar@deploy1002: samtar and stang: Backport for createExtensionTables: Add extension GeoData (T324348) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:29 samtar@deploy1002: Started scap: Backport for createExtensionTables: Add extension GeoData (T324348)
  • 21:27 jdrewniak@deploy1002: Finished scap: Backport for Add elwiki and arwiki to desktop-improvements group (T322391) (duration: 08m 31s)
  • 21:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:21 jdrewniak@deploy1002: jdrewniak and jdrewniak: Backport for Add elwiki and arwiki to desktop-improvements group (T322391) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:19 jdrewniak@deploy1002: Started scap: Backport for Add elwiki and arwiki to desktop-improvements group (T322391)
  • 21:17 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 55s)
  • 21:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:10 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 07s)
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 20:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:34 ryankemper: [Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005
  • 20:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:27 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 20:27 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776
  • 20:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776
  • 20:17 ryankemper: T323064 Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1
  • 20:12 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13 refs T320518
  • 20:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:59 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 16:14 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001
  • 16:14 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001
  • 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
  • 16:12 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
  • 16:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 eevans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye
  • 15:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
  • 15:48 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
  • 15:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye
  • 15:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 15:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 15:12 hashar: Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754
  • 15:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:07 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:05 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2001.codfw.wmnet
  • 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42650 and previous config saved to /var/cache/conftool/dbconfig/20221208-150109-ladsgroup.json
  • 14:59 hashar: Restarting Gerrit replica TWICE on gerrit2002.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754
  • 14:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:50 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:50 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:47 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42649 and previous config saved to /var/cache/conftool/dbconfig/20221208-144602-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42648 and previous config saved to /var/cache/conftool/dbconfig/20221208-144152-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42647 and previous config saved to /var/cache/conftool/dbconfig/20221208-144131-ladsgroup.json
  • 14:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:40 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42646 and previous config saved to /var/cache/conftool/dbconfig/20221208-142625-ladsgroup.json
  • 14:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:13 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42645 and previous config saved to /var/cache/conftool/dbconfig/20221208-141118-ladsgroup.json
  • 14:10 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:09 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42644 and previous config saved to /var/cache/conftool/dbconfig/20221208-135611-ladsgroup.json
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42643 and previous config saved to /var/cache/conftool/dbconfig/20221208-135402-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42642 and previous config saved to /var/cache/conftool/dbconfig/20221208-135341-ladsgroup.json
  • 13:43 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42641 and previous config saved to /var/cache/conftool/dbconfig/20221208-133835-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42640 and previous config saved to /var/cache/conftool/dbconfig/20221208-132329-ladsgroup.json
  • 13:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142] (duration: 00m 15s)
  • 13:20 aqu@deploy1002: Started deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142]
  • 13:19 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142] (duration: 00m 09s)
  • 13:19 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142]
  • 13:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42639 and previous config saved to /var/cache/conftool/dbconfig/20221208-130822-ladsgroup.json
  • 13:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42638 and previous config saved to /var/cache/conftool/dbconfig/20221208-130612-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42637 and previous config saved to /var/cache/conftool/dbconfig/20221208-130551-ladsgroup.json
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42635 and previous config saved to /var/cache/conftool/dbconfig/20221208-125045-ladsgroup.json
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42634 and previous config saved to /var/cache/conftool/dbconfig/20221208-124435-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42633 and previous config saved to /var/cache/conftool/dbconfig/20221208-123538-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42632 and previous config saved to /var/cache/conftool/dbconfig/20221208-122928-ladsgroup.json
  • 12:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42631 and previous config saved to /var/cache/conftool/dbconfig/20221208-122032-ladsgroup.json
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42630 and previous config saved to /var/cache/conftool/dbconfig/20221208-121823-ladsgroup.json
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42629 and previous config saved to /var/cache/conftool/dbconfig/20221208-121801-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42628 and previous config saved to /var/cache/conftool/dbconfig/20221208-121422-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42627 and previous config saved to /var/cache/conftool/dbconfig/20221208-120255-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42626 and previous config saved to /var/cache/conftool/dbconfig/20221208-115915-ladsgroup.json
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42625 and previous config saved to /var/cache/conftool/dbconfig/20221208-115659-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42624 and previous config saved to /var/cache/conftool/dbconfig/20221208-115627-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42623 and previous config saved to /var/cache/conftool/dbconfig/20221208-114748-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42622 and previous config saved to /var/cache/conftool/dbconfig/20221208-114120-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42621 and previous config saved to /var/cache/conftool/dbconfig/20221208-113240-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42620 and previous config saved to /var/cache/conftool/dbconfig/20221208-113030-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42619 and previous config saved to /var/cache/conftool/dbconfig/20221208-112951-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42618 and previous config saved to /var/cache/conftool/dbconfig/20221208-112612-ladsgroup.json
  • 11:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267] (duration: 00m 18s)
  • 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267]
  • 11:21 moritzm: drain ganeti5002 for eventual decom T324610
  • 11:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267] (duration: 00m 09s)
  • 11:20 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267]
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42617 and previous config saved to /var/cache/conftool/dbconfig/20221208-111444-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42616 and previous config saved to /var/cache/conftool/dbconfig/20221208-111105-ladsgroup.json
  • 11:10 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between T323771
  • 11:09 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between T323771
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42615 and previous config saved to /var/cache/conftool/dbconfig/20221208-110849-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42614 and previous config saved to /var/cache/conftool/dbconfig/20221208-110828-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42613 and previous config saved to /var/cache/conftool/dbconfig/20221208-105938-ladsgroup.json
  • 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between T323771
  • 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between T323771
  • 10:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42612 and previous config saved to /var/cache/conftool/dbconfig/20221208-105321-ladsgroup.json
  • 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42611 and previous config saved to /var/cache/conftool/dbconfig/20221208-104432-ladsgroup.json
  • 10:43 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between T323771
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42610 and previous config saved to /var/cache/conftool/dbconfig/20221208-104322-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42609 and previous config saved to /var/cache/conftool/dbconfig/20221208-104300-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42608 and previous config saved to /var/cache/conftool/dbconfig/20221208-103815-ladsgroup.json
  • 10:36 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662) (duration: 09m 17s)
  • 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
  • 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
  • 10:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42606 and previous config saved to /var/cache/conftool/dbconfig/20221208-102754-ladsgroup.json
  • 10:26 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662)
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42605 and previous config saved to /var/cache/conftool/dbconfig/20221208-102308-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42604 and previous config saved to /var/cache/conftool/dbconfig/20221208-102052-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42603 and previous config saved to /var/cache/conftool/dbconfig/20221208-102030-ladsgroup.json
  • 10:18 hashar: contint1002: activated Icinga monitoring , all services are up and running # T313832
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42602 and previous config saved to /var/cache/conftool/dbconfig/20221208-101247-ladsgroup.json
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42600 and previous config saved to /var/cache/conftool/dbconfig/20221208-100524-ladsgroup.json
  • 10:01 claime: Deploying puppet enforcement of zuul-merger on contint1002
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42599 and previous config saved to /var/cache/conftool/dbconfig/20221208-095741-ladsgroup.json
  • 09:57 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host test-reimage2001.codfw.wmnet
  • 09:56 steve_munene: restarting varnishkafka-webrequest.service on host cp1075 T323771
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42598 and previous config saved to /var/cache/conftool/dbconfig/20221208-095017-ladsgroup.json
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) test-reimage2001.codfw.wmnet on all recursors
  • 09:50 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache test-reimage2001.codfw.wmnet on all recursors
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
  • 09:49 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
  • 09:46 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:46 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host test-reimage2001.codfw.wmnet
  • 09:43 hashar: contint1002: stopped puppet and manually started zuul-merger. I am monitoring it cause last time we have bring up a new one it had some issues here and there # T313832
  • 09:38 hashar: contint1001: manually stopped and masked zuul-merger. It is under maintenance mode in Icinga # T313832
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42597 and previous config saved to /var/cache/conftool/dbconfig/20221208-093511-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42596 and previous config saved to /var/cache/conftool/dbconfig/20221208-093255-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42595 and previous config saved to /var/cache/conftool/dbconfig/20221208-093218-ladsgroup.json
  • 09:25 hashar@deploy1002: Finished deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # T313832 (duration: 00m 07s)
  • 09:24 hashar@deploy1002: Started deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # T313832
  • 09:17 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # T313832 (duration: 00m 03s)
  • 09:17 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # T313832
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42594 and previous config saved to /var/cache/conftool/dbconfig/20221208-091712-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42593 and previous config saved to /var/cache/conftool/dbconfig/20221208-090205-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42592 and previous config saved to /var/cache/conftool/dbconfig/20221208-085724-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42591 and previous config saved to /var/cache/conftool/dbconfig/20221208-085657-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42590 and previous config saved to /var/cache/conftool/dbconfig/20221208-084659-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42589 and previous config saved to /var/cache/conftool/dbconfig/20221208-084442-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42588 and previous config saved to /var/cache/conftool/dbconfig/20221208-084421-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42587 and previous config saved to /var/cache/conftool/dbconfig/20221208-084151-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42586 and previous config saved to /var/cache/conftool/dbconfig/20221208-082914-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42585 and previous config saved to /var/cache/conftool/dbconfig/20221208-082644-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42584 and previous config saved to /var/cache/conftool/dbconfig/20221208-081408-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42583 and previous config saved to /var/cache/conftool/dbconfig/20221208-081138-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42582 and previous config saved to /var/cache/conftool/dbconfig/20221208-075901-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42581 and previous config saved to /var/cache/conftool/dbconfig/20221208-075645-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42580 and previous config saved to /var/cache/conftool/dbconfig/20221208-075624-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42579 and previous config saved to /var/cache/conftool/dbconfig/20221208-074117-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42578 and previous config saved to /var/cache/conftool/dbconfig/20221208-073122-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42577 and previous config saved to /var/cache/conftool/dbconfig/20221208-073101-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42576 and previous config saved to /var/cache/conftool/dbconfig/20221208-072611-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42575 and previous config saved to /var/cache/conftool/dbconfig/20221208-071554-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42574 and previous config saved to /var/cache/conftool/dbconfig/20221208-071104-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42573 and previous config saved to /var/cache/conftool/dbconfig/20221208-070847-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42572 and previous config saved to /var/cache/conftool/dbconfig/20221208-070825-ladsgroup.json
  • 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42571 and previous config saved to /var/cache/conftool/dbconfig/20221208-070048-ladsgroup.json
  • 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 31800
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 31800
  • 06:55 bblack: lvs1017: restarting pybal to take back text traffic (med reverted to normal, underlying problem w/ ipv6 addressed)
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42570 and previous config saved to /var/cache/conftool/dbconfig/20221208-065319-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42569 and previous config saved to /var/cache/conftool/dbconfig/20221208-064541-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42568 and previous config saved to /var/cache/conftool/dbconfig/20221208-063813-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42567 and previous config saved to /var/cache/conftool/dbconfig/20221208-062306-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42566 and previous config saved to /var/cache/conftool/dbconfig/20221208-062050-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42565 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42564 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42563 and previous config saved to /var/cache/conftool/dbconfig/20221208-062006-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42562 and previous config saved to /var/cache/conftool/dbconfig/20221208-061436-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42561 and previous config saved to /var/cache/conftool/dbconfig/20221208-060551-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42560 and previous config saved to /var/cache/conftool/dbconfig/20221208-060522-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42559 and previous config saved to /var/cache/conftool/dbconfig/20221208-060500-ladsgroup.json
  • 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42558 and previous config saved to /var/cache/conftool/dbconfig/20221208-055930-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42557 and previous config saved to /var/cache/conftool/dbconfig/20221208-055046-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42556 and previous config saved to /var/cache/conftool/dbconfig/20221208-055015-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42555 and previous config saved to /var/cache/conftool/dbconfig/20221208-054953-ladsgroup.json
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42554 and previous config saved to /var/cache/conftool/dbconfig/20221208-054423-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42553 and previous config saved to /var/cache/conftool/dbconfig/20221208-053541-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42552 and previous config saved to /var/cache/conftool/dbconfig/20221208-053509-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42551 and previous config saved to /var/cache/conftool/dbconfig/20221208-053447-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42550 and previous config saved to /var/cache/conftool/dbconfig/20221208-053253-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42549 and previous config saved to /var/cache/conftool/dbconfig/20221208-053236-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 05:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42548 and previous config saved to /var/cache/conftool/dbconfig/20221208-052917-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42547 and previous config saved to /var/cache/conftool/dbconfig/20221208-052705-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42546 and previous config saved to /var/cache/conftool/dbconfig/20221208-052036-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 03:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 02:24 bblack: lvs1017 - restary pybal manually again, back on bgp_med=101 (traffic goes back to lvs1020)
  • 02:21 bblack: restarting pybal on lvs1017 manually again with bgp_med=0 (should take traffic, may or may not do so very usefully!)
  • 02:05 bblack: sretest1001 - puppet disabled, manipulating routing on this host to conduct tests...
  • 01:56 bblack: lvs1017 - manually setting BGP MED to 101 and starting pybal (should come back and and speak BGP, but not steal traffic from lvs1020)
  • 01:29 bblack: lvs1017 - disable puppet and stop pybal to fix ipv6 for now
  • 01:27 bblack: lvs1017: restart pybal, attempt to fix text-ipv6 service
  • 01:05 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic1" lvs at all sites - T324336
  • 01:00 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic2" lvs at all sites - T324336
  • 00:47 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "secondary" lvs at all sites - T324336 (5 hosts, ulsfo completed previously)
  • 00:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1012.eqiad.wmnet with OS bullseye
  • 00:29 bblack: lvs4010: restart pybal to test etcd key changes - T324336
  • 00:16 bblack: disabling puppet on all cp and lvs hosts for conftool key changes. Please coordinate if any lvs/pybal/cpNNNN depooling/work is needed during this transition!
  • 00:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=cdn
  • 00:12 bblack@cumin1001: conftool action : set/weight=1; selector: service=cdn
  • 00:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage
  • 00:04 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage

2022-12-07

  • 23:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1012.eqiad.wmnet with OS bullseye
  • 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42545 and previous config saved to /var/cache/conftool/dbconfig/20221207-233130-ladsgroup.json
  • 23:24 mutante: mx1001 about to run out of disk again - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not T305567
  • 23:23 mutante: mx1001 - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42544 and previous config saved to /var/cache/conftool/dbconfig/20221207-231623-ladsgroup.json
  • 23:14 samtar@deploy1002: Finished scap: Backport for Make parsoid accept all content models. (T324711) (duration: 13m 57s)
  • 23:02 samtar@deploy1002: samtar and samtar: Backport for Make parsoid accept all content models. (T324711) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42543 and previous config saved to /var/cache/conftool/dbconfig/20221207-230116-ladsgroup.json
  • 23:00 samtar@deploy1002: Started scap: Backport for Make parsoid accept all content models. (T324711)
  • 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:48 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:48 TheresNoTime: Going to backport gerrit:865749 to wmf/1.40.0-wmf.13 for T324711
  • 22:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42542 and previous config saved to /var/cache/conftool/dbconfig/20221207-224610-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42541 and previous config saved to /var/cache/conftool/dbconfig/20221207-224502-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42540 and previous config saved to /var/cache/conftool/dbconfig/20221207-224440-ladsgroup.json
  • 22:41 ryankemper: T301167 Downtimed `wdqs20[09-12]` for 7 days
  • 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2009.*
  • 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2010.*
  • 22:35 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:32 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:29 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42539 and previous config saved to /var/cache/conftool/dbconfig/20221207-222934-ladsgroup.json
  • 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:28 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:26 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:25 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:25 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42538 and previous config saved to /var/cache/conftool/dbconfig/20221207-221427-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42537 and previous config saved to /var/cache/conftool/dbconfig/20221207-220110-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42536 and previous config saved to /var/cache/conftool/dbconfig/20221207-215921-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42535 and previous config saved to /var/cache/conftool/dbconfig/20221207-215712-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42534 and previous config saved to /var/cache/conftool/dbconfig/20221207-215651-ladsgroup.json
  • 21:56 TheresNoTime: UTC late backport window done
  • 21:51 samtar@deploy1002: backport aborted: (duration: 00m 15s)
  • 21:49 samtar@deploy1002: Sync cancelled.
  • 21:47 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865773"
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42533 and previous config saved to /var/cache/conftool/dbconfig/20221207-214603-ladsgroup.json
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5003.eqsin.wmnet
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 21:43 samtar@deploy1002: samtar and stang: Backport for specieswiki: Install GeoData extension (T324348) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:43 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 21:41 samtar@deploy1002: Started scap: Backport for specieswiki: Install GeoData extension (T324348)
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42532 and previous config saved to /var/cache/conftool/dbconfig/20221207-214145-ladsgroup.json
  • 21:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 21:40 samtar@deploy1002: Finished scap: Backport for Remove Research Incentive survey from frwiki (T321930) (duration: 09m 04s)
  • 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5003.eqsin.wmnet
  • 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
  • 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
  • 21:34 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865742"
  • 21:32 samtar@deploy1002: samtar and dani: Backport for Remove Research Incentive survey from frwiki (T321930) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS buster
  • 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 21:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42530 and previous config saved to /var/cache/conftool/dbconfig/20221207-213057-ladsgroup.json
  • 21:30 samtar@deploy1002: Started scap: Backport for Remove Research Incentive survey from frwiki (T321930)
  • 21:28 samtar@deploy1002: Finished scap: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672) (duration: 20m 35s)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42529 and previous config saved to /var/cache/conftool/dbconfig/20221207-212638-ladsgroup.json
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42528 and previous config saved to /var/cache/conftool/dbconfig/20221207-211551-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42527 and previous config saved to /var/cache/conftool/dbconfig/20221207-211338-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42526 and previous config saved to /var/cache/conftool/dbconfig/20221207-211317-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42525 and previous config saved to /var/cache/conftool/dbconfig/20221207-211132-ladsgroup.json
  • 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 21:10 samtar@deploy1002: samtar and daniel: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42524 and previous config saved to /var/cache/conftool/dbconfig/20221207-210923-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42523 and previous config saved to /var/cache/conftool/dbconfig/20221207-210902-ladsgroup.json
  • 21:08 samtar@deploy1002: Started scap: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)
  • 21:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 21:02 mutante: contint1002 a2dismod mpm_event - https://phabricator.wikimedia.org/T208108 Bug: T313832
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42522 and previous config saved to /var/cache/conftool/dbconfig/20221207-205811-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42521 and previous config saved to /var/cache/conftool/dbconfig/20221207-205356-ladsgroup.json
  • 20:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS buster
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42520 and previous config saved to /var/cache/conftool/dbconfig/20221207-204304-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42519 and previous config saved to /var/cache/conftool/dbconfig/20221207-203849-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42517 and previous config saved to /var/cache/conftool/dbconfig/20221207-202758-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42516 and previous config saved to /var/cache/conftool/dbconfig/20221207-202545-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42515 and previous config saved to /var/cache/conftool/dbconfig/20221207-202524-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42514 and previous config saved to /var/cache/conftool/dbconfig/20221207-202343-ladsgroup.json
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42513 and previous config saved to /var/cache/conftool/dbconfig/20221207-202134-ladsgroup.json
  • 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42512 and previous config saved to /var/cache/conftool/dbconfig/20221207-202113-ladsgroup.json
  • 20:20 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.13 refs T320518 (duration: 07m 03s)
  • 20:13 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.13 refs T320518
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42511 and previous config saved to /var/cache/conftool/dbconfig/20221207-201016-ladsgroup.json
  • 20:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
  • 20:09 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42510 and previous config saved to /var/cache/conftool/dbconfig/20221207-200606-ladsgroup.json
  • 20:00 mutante: contint* - deploying firewall changes to add contint1002 - T313832
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42509 and previous config saved to /var/cache/conftool/dbconfig/20221207-195510-ladsgroup.json
  • 19:53 mutante: registry* (docker registry HA) - adding contint1002 to allowed hosts gerrit:865680 T313832
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42508 and previous config saved to /var/cache/conftool/dbconfig/20221207-195100-ladsgroup.json
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42507 and previous config saved to /var/cache/conftool/dbconfig/20221207-194003-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42506 and previous config saved to /var/cache/conftool/dbconfig/20221207-193751-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42505 and previous config saved to /var/cache/conftool/dbconfig/20221207-193730-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42504 and previous config saved to /var/cache/conftool/dbconfig/20221207-193553-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42503 and previous config saved to /var/cache/conftool/dbconfig/20221207-193445-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42502 and previous config saved to /var/cache/conftool/dbconfig/20221207-193350-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42501 and previous config saved to /var/cache/conftool/dbconfig/20221207-192223-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42500 and previous config saved to /var/cache/conftool/dbconfig/20221207-191843-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42499 and previous config saved to /var/cache/conftool/dbconfig/20221207-191328-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42498 and previous config saved to /var/cache/conftool/dbconfig/20221207-190717-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42497 and previous config saved to /var/cache/conftool/dbconfig/20221207-190337-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42496 and previous config saved to /var/cache/conftool/dbconfig/20221207-185821-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42495 and previous config saved to /var/cache/conftool/dbconfig/20221207-185210-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42494 and previous config saved to /var/cache/conftool/dbconfig/20221207-184958-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:49 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1026.eqiad.wmnet with OS bullseye
  • 18:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42493 and previous config saved to /var/cache/conftool/dbconfig/20221207-184851-ladsgroup.json
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42492 and previous config saved to /var/cache/conftool/dbconfig/20221207-184830-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42491 and previous config saved to /var/cache/conftool/dbconfig/20221207-184722-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42490 and previous config saved to /var/cache/conftool/dbconfig/20221207-184700-ladsgroup.json
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42489 and previous config saved to /var/cache/conftool/dbconfig/20221207-184315-ladsgroup.json
  • 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42488 and previous config saved to /var/cache/conftool/dbconfig/20221207-183344-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42487 and previous config saved to /var/cache/conftool/dbconfig/20221207-183154-ladsgroup.json
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42486 and previous config saved to /var/cache/conftool/dbconfig/20221207-182808-ladsgroup.json
  • 18:27 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
  • 18:23 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42485 and previous config saved to /var/cache/conftool/dbconfig/20221207-181838-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42484 and previous config saved to /var/cache/conftool/dbconfig/20221207-181647-ladsgroup.json
  • 18:06 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
  • 18:04 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1026']
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42483 and previous config saved to /var/cache/conftool/dbconfig/20221207-180331-ladsgroup.json
  • 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2002.codfw.wmnet with OS bullseye
  • 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42482 and previous config saved to /var/cache/conftool/dbconfig/20221207-180140-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42481 and previous config saved to /var/cache/conftool/dbconfig/20221207-180132-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42480 and previous config saved to /var/cache/conftool/dbconfig/20221207-180119-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42479 and previous config saved to /var/cache/conftool/dbconfig/20221207-180110-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42478 and previous config saved to /var/cache/conftool/dbconfig/20221207-180058-ladsgroup.json
  • 18:00 sukhe: restart pybal on lvs5003 to pick up bgp-med change
  • 17:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:53 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1026']
  • 17:50 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc] (duration: 01m 15s)
  • 17:49 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc]
  • 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:48 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc] (duration: 00m 07s)
  • 17:48 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc]
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 T324692', diff saved to https://phabricator.wikimedia.org/P42477 and previous config saved to /var/cache/conftool/dbconfig/20221207-174811-ladsgroup.json
  • 17:46 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
  • 17:46 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc] (duration: 79m 12s)
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42476 and previous config saved to /var/cache/conftool/dbconfig/20221207-174604-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42475 and previous config saved to /var/cache/conftool/dbconfig/20221207-174551-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary T324692', diff saved to https://phabricator.wikimedia.org/P42474 and previous config saved to /var/cache/conftool/dbconfig/20221207-174540-ladsgroup.json
  • 17:45 Amir1: Starting s1 codfw failover from db2112 to db2103 - T324692
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 17:42 sukhe: restart pybal on lvs5005 to pick up bgp-med
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42473 and previous config saved to /var/cache/conftool/dbconfig/20221207-173350-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42472 and previous config saved to /var/cache/conftool/dbconfig/20221207-173329-ladsgroup.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42471 and previous config saved to /var/cache/conftool/dbconfig/20221207-173057-ladsgroup.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42470 and previous config saved to /var/cache/conftool/dbconfig/20221207-173045-ladsgroup.json
  • 17:27 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 17:26 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 17:26 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1026.eqiad.wmnet with OS bullseye
  • 17:25 sukhe: running homer for Gerrit: 865712
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42469 and previous config saved to /var/cache/conftool/dbconfig/20221207-171822-ladsgroup.json
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42468 and previous config saved to /var/cache/conftool/dbconfig/20221207-171803-ladsgroup.json
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5002.eqsin.wmnet
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42467 and previous config saved to /var/cache/conftool/dbconfig/20221207-171551-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42466 and previous config saved to /var/cache/conftool/dbconfig/20221207-171538-ladsgroup.json
  • 17:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
  • 17:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T324692', diff saved to https://phabricator.wikimedia.org/P42465 and previous config saved to /var/cache/conftool/dbconfig/20221207-171416-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42464 and previous config saved to /var/cache/conftool/dbconfig/20221207-171342-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42463 and previous config saved to /var/cache/conftool/dbconfig/20221207-171326-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42462 and previous config saved to /var/cache/conftool/dbconfig/20221207-171321-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42461 and previous config saved to /var/cache/conftool/dbconfig/20221207-171305-ladsgroup.json
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
  • 17:12 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T324692
  • 17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T324692
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
  • 17:08 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5002.eqsin.wmnet
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42460 and previous config saved to /var/cache/conftool/dbconfig/20221207-170316-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42459 and previous config saved to /var/cache/conftool/dbconfig/20221207-170256-ladsgroup.json
  • 17:01 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581) (duration: 14m 46s)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42458 and previous config saved to /var/cache/conftool/dbconfig/20221207-165815-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42457 and previous config saved to /var/cache/conftool/dbconfig/20221207-165758-ladsgroup.json
  • 16:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 16:56 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:48 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42456 and previous config saved to /var/cache/conftool/dbconfig/20221207-164809-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42455 and previous config saved to /var/cache/conftool/dbconfig/20221207-164748-ladsgroup.json
  • 16:46 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581)
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42454 and previous config saved to /var/cache/conftool/dbconfig/20221207-164308-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42453 and previous config saved to /var/cache/conftool/dbconfig/20221207-164258-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42452 and previous config saved to /var/cache/conftool/dbconfig/20221207-164252-ladsgroup.json
  • 16:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:42 sukhe: restart pybal on lvs5002
  • 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
  • 16:38 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.240/28 next-hop 10.132.0.6: T322048
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42451 and previous config saved to /var/cache/conftool/dbconfig/20221207-163242-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42450 and previous config saved to /var/cache/conftool/dbconfig/20221207-163031-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:29 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42449 and previous config saved to /var/cache/conftool/dbconfig/20221207-162802-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42448 and previous config saved to /var/cache/conftool/dbconfig/20221207-162745-ladsgroup.json
  • 16:27 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc]
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42447 and previous config saved to /var/cache/conftool/dbconfig/20221207-162553-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42446 and previous config saved to /var/cache/conftool/dbconfig/20221207-162533-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:25 aqu: Deploying analytics/refinery (HDFS usage scripts)
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 16:24 sukhe: run homer in cr*-eqsin for Gerrit: 865615
  • 16:09 denisse: sync rancid in netmon2001 and netmon2002
  • 16:08 denisse: sync librenms RRD in netmon2002
  • 16:08 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581) (duration: 10m 59s)
  • 16:06 sukhe: run homer in cr*-eqsin for Gerrit: 865660
  • 16:02 denisse: Sync LibreNMS RRD in netmon2001
  • 15:59 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:57 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581)
  • 15:46 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581) (duration: 08m 29s)
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS buster
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:39 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:37 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581)
  • 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS buster
  • 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:24 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 15:11 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 15:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 14:56 krinkle@deploy1002: Finished deploy [performance/navtiming@6caa033]: (no justification provided) (duration: 00m 07s)
  • 14:56 krinkle@deploy1002: Started deploy [performance/navtiming@6caa033]: (no justification provided)
  • 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS buster
  • 14:38 XioNoX: draining Arelion eqiad-codfw circuit for optic replacement
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS buster
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5002.wikimedia.org
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:27 moritzm: restarting ntpd
  • 14:26 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:24 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:20 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5002.wikimedia.org
  • 14:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
  • 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
  • 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
  • 14:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
  • 13:42 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13 refs T320518 (duration: 07m 45s)
  • 13:34 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
  • 13:22 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42443 and previous config saved to /var/cache/conftool/dbconfig/20221207-131858-marostegui.json
  • 13:09 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
  • 13:09 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
  • 12:37 moritzm: upgrading mwmaint servers to PHP 7.4.33
  • 12:33 moritzm: upgrading deployment servers to PHP 7.4.33
  • 12:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:32 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:30 moritzm: upgrading cloudweb to PHP 7.4.33
  • 12:28 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:28 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 12:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 12:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:15 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:12 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:12 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 12:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 12:10 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 12:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 12:09 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 12:09 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 11:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:59 moritzm: imported rsyslog 8.2102.0-2+deb11u1~buster1 to component/rsyslog-openssl T324623
  • 11:57 moritzm: imported librelp 1.10.0-1~buster1 to component/rsyslog-openssl T324623
  • 11:55 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:55 sukhe: running authdns-update for Gerrit: 865605
  • 11:55 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 11:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:54 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 11:51 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks (duration: 00m 14s)
  • 11:50 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks
  • 11:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:11 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin2001.codfw.wmnet
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin2001.codfw.wmnet on all recursors
  • 11:06 volans@cumin2002: START - Cookbook sre.dns.wipe-cache cloudcumin2001.codfw.wmnet on all recursors
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
  • 11:05 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
  • 11:03 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:01 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host cloudcumin2001.codfw.wmnet
  • 10:58 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin1001.eqiad.wmnet
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin1001.eqiad.wmnet on all recursors
  • 10:53 volans@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcumin1001.eqiad.wmnet on all recursors
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
  • 10:52 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
  • 10:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudcumin1001.eqiad.wmnet
  • 10:35 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581) (duration: 10m 29s)
  • 10:27 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 10:26 claime: rebooted contin1001.eqiad.wmnet
  • 10:25 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581)
  • 10:17 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581) (duration: 10m 48s)
  • 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
  • 10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
  • 10:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 714
  • 10:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 10:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16276
  • 10:09 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:09 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:07 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581)
  • 10:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 10:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35320
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35320
  • 10:00 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 8932
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
  • 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13150
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138064
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138064
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
  • 09:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
  • 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31800
  • 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
  • 09:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 31800
  • 09:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
  • 09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45430
  • 09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45430
  • 09:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7568
  • 09:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7568
  • 09:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:34 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581) (duration: 09m 08s)
  • 09:27 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 09:25 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581)
  • 09:23 jiji@deploy1002: backport aborted: (duration: 00m 18s)
  • 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 395570
  • 09:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 395570
  • 09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 397715
  • 09:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 397715
  • 09:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1108.eqiad.wmnet
  • 09:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
  • 09:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42439 and previous config saved to /var/cache/conftool/dbconfig/20221207-071831-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42438 and previous config saved to /var/cache/conftool/dbconfig/20221207-070326-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42437 and previous config saved to /var/cache/conftool/dbconfig/20221207-064821-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42436 and previous config saved to /var/cache/conftool/dbconfig/20221207-063316-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42435 and previous config saved to /var/cache/conftool/dbconfig/20221207-061810-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42434 and previous config saved to /var/cache/conftool/dbconfig/20221207-060305-root.json
  • 05:58 marostegui: Drop phab1001 grants from m3 databases T323418
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42433 and previous config saved to /var/cache/conftool/dbconfig/20221207-054759-root.json
  • 03:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1011.eqiad.wmnet with OS bullseye
  • 03:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
  • 03:15 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
  • 02:49 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1011.eqiad.wmnet with OS bullseye
  • 02:39 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1011']
  • 02:33 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1011']
  • 00:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 00:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1023.eqiad.wmnet with OS bullseye

2022-12-06

  • 23:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 23:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 23:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 22:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
  • 22:37 tgr_: UTC late backports done
  • 22:36 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
  • 22:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 22:26 tgr@deploy1002: Finished scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285) (duration: 18m 58s)
  • 22:09 tgr@deploy1002: tgr and tgr: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:07 tgr@deploy1002: Started scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)
  • 21:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 21:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
  • 20:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
  • 20:25 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:25 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:24 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1028.eqiad.wmnet with OS bullseye
  • 20:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1029']
  • 20:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
  • 20:13 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1029']
  • 20:06 eileen: civicrm upgraded from c9761fee to 3ae68ab4
  • 20:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
  • 20:01 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
  • 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bullseye
  • 19:57 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
  • 19:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bullseye
  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bullseye
  • 19:45 ejegg: payments-wiki upgraded from a875f2b9 to 1914b6c7
  • 19:40 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 19:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 19:38 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1027.eqiad.wmnet with OS bullseye
  • 19:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 19:32 ladsgroup@deploy1002: Finished scap: Backport for Avoid syntax error on hover in grade C browsers (T324514) (duration: 12m 43s)
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 19:32 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 19:22 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 19:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1028']
  • 19:21 ladsgroup@deploy1002: ladsgroup and jdlrobson: Backport for Avoid syntax error on hover in grade C browsers (T324514) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 19:19 ladsgroup@deploy1002: Started scap: Backport for Avoid syntax error on hover in grade C browsers (T324514)
  • 19:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.13 refs T320518
  • 19:16 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
  • 19:14 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 19:12 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bullseye
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bullseye
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bullseye
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5006']
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5007']
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5005']
  • 18:56 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
  • 18:55 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:45 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5007']
  • 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5006']
  • 18:44 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5005']
  • 18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns5003']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5006']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5005']
  • 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1027']
  • 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 18:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns5003']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5006']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5005']
  • 18:28 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['logstash1028']
  • 18:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 18:27 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 18:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 18:26 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:18 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
  • 18:08 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:07 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
  • 18:07 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
  • 18:07 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 09s)
  • 18:06 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:05 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:02 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 18:02 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 17:56 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1027.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5003
  • 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5003
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5007
  • 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5007
  • 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5006
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:24 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5006
  • 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5005
  • 17:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5005
  • 17:22 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 17:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
  • 17:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
  • 17:17 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
  • 17:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:12 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
  • 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:46 kostajh: UTC afternoon backports done
  • 16:44 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Start oldimpact experiment (T323526) (duration: 10m 54s)
  • 16:35 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Start oldimpact experiment (T323526) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:33 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Start oldimpact experiment (T323526)
  • 16:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
  • 16:30 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686) (duration: 10m 14s)
  • 16:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 16:21 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5005
  • 16:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5005
  • 16:21 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
  • 16:19 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686)
  • 16:18 kharlan@deploy1002: backport aborted: (duration: 02m 53s)
  • 16:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
  • 16:15 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:13 kharlan@deploy1002: Finished scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286) (duration: 29m 43s)
  • 16:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:10 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:02 kharlan@deploy1002: kharlan and urbanecm and kharlan: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.
  • 15:45 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
  • 15:44 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
  • 15:43 kharlan@deploy1002: Started scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286)
  • 15:41 reedy@deploy1002: Synchronized php-1.40.0-wmf.13/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 01s)
  • 15:33 reedy@deploy1002: Synchronized php-1.40.0-wmf.12/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 13s)
  • 15:20 kharlan@deploy1002: Finished scap: Backport for Localisation updates from https://translatewiki.net. (duration: 10m 48s)
  • 15:13 kharlan@deploy1002: kharlan and urbanecm: Backport for Localisation updates from https://translatewiki.net. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:10 kharlan@deploy1002: Started scap: Backport for Localisation updates from https://translatewiki.net.
  • 14:52 kharlan@deploy1002: Finished scap: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198) (duration: 10m 07s)
  • 14:44 kharlan@deploy1002: kharlan and kharlan: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:42 kharlan@deploy1002: Started scap: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)
  • 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:25 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:21 kharlan@deploy1002: Finished scap: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285) (duration: 09m 04s)
  • 14:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 14:13 kharlan@deploy1002: kharlan and kharlan: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:12 kharlan@deploy1002: Started scap: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285)
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
  • 13:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 13:00 kharlan@deploy1002: Finished scap: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542) (duration: 07m 57s)
  • 12:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:54 kharlan@deploy1002: kharlan and kharlan: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:52 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 12:52 kharlan@deploy1002: Started scap: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)
  • 12:49 moritzm: installing glibc security updates on buster
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:29 jnuche@deploy1002: Pruned MediaWiki: 1.40.0-wmf.10 (duration: 02m 09s)
  • 12:27 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 12:27 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx (exit_code=1) rolling restart_daemons on A:wcqs-public
  • 12:27 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13 refs T320518 (duration: 05m 52s)
  • 12:21 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 12:14 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 12:10 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 11:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 10:59 kostajh: UTC morning deploys done
  • 10:56 moritzm: installing freetype security updates
  • 10:48 kharlan@deploy1002: Finished scap: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286) (duration: 31m 25s)
  • 10:36 kharlan@deploy1002: kharlan and kharlan: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:16 kharlan@deploy1002: Started scap: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286)
  • 09:37 kharlan@deploy1002: Finished scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285) (duration: 28m 05s)
  • 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:09 kharlan@deploy1002: Started scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285)
  • 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42426 and previous config saved to /var/cache/conftool/dbconfig/20221206-064402-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42425 and previous config saved to /var/cache/conftool/dbconfig/20221206-062856-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42424 and previous config saved to /var/cache/conftool/dbconfig/20221206-061349-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42423 and previous config saved to /var/cache/conftool/dbconfig/20221206-055843-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42422 and previous config saved to /var/cache/conftool/dbconfig/20221206-054030-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42421 and previous config saved to /var/cache/conftool/dbconfig/20221206-053911-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42420 and previous config saved to /var/cache/conftool/dbconfig/20221206-053850-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42419 and previous config saved to /var/cache/conftool/dbconfig/20221206-052523-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42418 and previous config saved to /var/cache/conftool/dbconfig/20221206-052343-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42417 and previous config saved to /var/cache/conftool/dbconfig/20221206-051016-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42416 and previous config saved to /var/cache/conftool/dbconfig/20221206-050837-ladsgroup.json
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42415 and previous config saved to /var/cache/conftool/dbconfig/20221206-045510-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42414 and previous config saved to /var/cache/conftool/dbconfig/20221206-045330-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42413 and previous config saved to /var/cache/conftool/dbconfig/20221206-043348-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42412 and previous config saved to /var/cache/conftool/dbconfig/20221206-043326-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42411 and previous config saved to /var/cache/conftool/dbconfig/20221206-042850-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 04:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42410 and previous config saved to /var/cache/conftool/dbconfig/20221206-042828-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42409 and previous config saved to /var/cache/conftool/dbconfig/20221206-041820-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42408 and previous config saved to /var/cache/conftool/dbconfig/20221206-041322-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42407 and previous config saved to /var/cache/conftool/dbconfig/20221206-040313-ladsgroup.json
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42406 and previous config saved to /var/cache/conftool/dbconfig/20221206-035815-ladsgroup.json
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42405 and previous config saved to /var/cache/conftool/dbconfig/20221206-034806-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42404 and previous config saved to /var/cache/conftool/dbconfig/20221206-034309-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42403 and previous config saved to /var/cache/conftool/dbconfig/20221206-032818-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42402 and previous config saved to /var/cache/conftool/dbconfig/20221206-032756-ladsgroup.json
  • 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42401 and previous config saved to /var/cache/conftool/dbconfig/20221206-031250-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42400 and previous config saved to /var/cache/conftool/dbconfig/20221206-025831-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42399 and previous config saved to /var/cache/conftool/dbconfig/20221206-025821-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42398 and previous config saved to /var/cache/conftool/dbconfig/20221206-025743-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42397 and previous config saved to /var/cache/conftool/dbconfig/20221206-024314-ladsgroup.json
  • 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42396 and previous config saved to /var/cache/conftool/dbconfig/20221206-024236-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42395 and previous config saved to /var/cache/conftool/dbconfig/20221206-022817-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42394 and previous config saved to /var/cache/conftool/dbconfig/20221206-022808-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42393 and previous config saved to /var/cache/conftool/dbconfig/20221206-021638-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42392 and previous config saved to /var/cache/conftool/dbconfig/20221206-021617-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42391 and previous config saved to /var/cache/conftool/dbconfig/20221206-021310-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42390 and previous config saved to /var/cache/conftool/dbconfig/20221206-021301-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42389 and previous config saved to /var/cache/conftool/dbconfig/20221206-020110-ladsgroup.json
  • 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42388 and previous config saved to /var/cache/conftool/dbconfig/20221206-015757-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42387 and previous config saved to /var/cache/conftool/dbconfig/20221206-014604-ladsgroup.json
  • 01:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42386 and previous config saved to /var/cache/conftool/dbconfig/20221206-014251-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42385 and previous config saved to /var/cache/conftool/dbconfig/20221206-014046-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42384 and previous config saved to /var/cache/conftool/dbconfig/20221206-014038-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42383 and previous config saved to /var/cache/conftool/dbconfig/20221206-014017-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42382 and previous config saved to /var/cache/conftool/dbconfig/20221206-013057-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42381 and previous config saved to /var/cache/conftool/dbconfig/20221206-012812-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42380 and previous config saved to /var/cache/conftool/dbconfig/20221206-012750-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42378 and previous config saved to /var/cache/conftool/dbconfig/20221206-012510-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42377 and previous config saved to /var/cache/conftool/dbconfig/20221206-011244-ladsgroup.json
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42376 and previous config saved to /var/cache/conftool/dbconfig/20221206-011128-ladsgroup.json
  • 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42375 and previous config saved to /var/cache/conftool/dbconfig/20221206-011033-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42374 and previous config saved to /var/cache/conftool/dbconfig/20221206-011003-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42373 and previous config saved to /var/cache/conftool/dbconfig/20221206-005737-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42372 and previous config saved to /var/cache/conftool/dbconfig/20221206-005526-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42371 and previous config saved to /var/cache/conftool/dbconfig/20221206-005457-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42370 and previous config saved to /var/cache/conftool/dbconfig/20221206-005401-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42369 and previous config saved to /var/cache/conftool/dbconfig/20221206-005339-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42368 and previous config saved to /var/cache/conftool/dbconfig/20221206-005244-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 00:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42367 and previous config saved to /var/cache/conftool/dbconfig/20221206-005223-ladsgroup.json
  • 00:51 cstone: payments-wiki upgraded from b613ddfb to 0cd7e779
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42366 and previous config saved to /var/cache/conftool/dbconfig/20221206-004231-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42365 and previous config saved to /var/cache/conftool/dbconfig/20221206-003833-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42364 and previous config saved to /var/cache/conftool/dbconfig/20221206-003716-ladsgroup.json
  • 00:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42363 and previous config saved to /var/cache/conftool/dbconfig/20221206-002945-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42362 and previous config saved to /var/cache/conftool/dbconfig/20221206-002326-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42361 and previous config saved to /var/cache/conftool/dbconfig/20221206-002210-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42360 and previous config saved to /var/cache/conftool/dbconfig/20221206-001438-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42359 and previous config saved to /var/cache/conftool/dbconfig/20221206-000820-ladsgroup.json
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42358 and previous config saved to /var/cache/conftool/dbconfig/20221206-000703-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42357 and previous config saved to /var/cache/conftool/dbconfig/20221206-000654-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42356 and previous config saved to /var/cache/conftool/dbconfig/20221206-000633-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42355 and previous config saved to /var/cache/conftool/dbconfig/20221206-000444-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42354 and previous config saved to /var/cache/conftool/dbconfig/20221206-000329-ladsgroup.json

2022-12-05

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
  • 23:57 tzatziki: removing 2 files for legal compliance
  • 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
  • 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
  • 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates (duration: 02m 30s)
  • 23:44 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42349 and previous config saved to /var/cache/conftool/dbconfig/20221205-234425-ladsgroup.json
  • 23:41 tzatziki: removing 5 files for legal compliance
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42348 and previous config saved to /var/cache/conftool/dbconfig/20221205-233620-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42347 and previous config saved to /var/cache/conftool/dbconfig/20221205-233316-ladsgroup.json
  • 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42346 and previous config saved to /var/cache/conftool/dbconfig/20221205-232453-ladsgroup.json
  • 23:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42345 and previous config saved to /var/cache/conftool/dbconfig/20221205-232432-ladsgroup.json
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42344 and previous config saved to /var/cache/conftool/dbconfig/20221205-232113-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42343 and previous config saved to /var/cache/conftool/dbconfig/20221205-231948-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 23:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42342 and previous config saved to /var/cache/conftool/dbconfig/20221205-231926-ladsgroup.json
  • 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42341 and previous config saved to /var/cache/conftool/dbconfig/20221205-231809-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42340 and previous config saved to /var/cache/conftool/dbconfig/20221205-231608-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42339 and previous config saved to /var/cache/conftool/dbconfig/20221205-231556-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42338 and previous config saved to /var/cache/conftool/dbconfig/20221205-231535-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42337 and previous config saved to /var/cache/conftool/dbconfig/20221205-230925-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42336 and previous config saved to /var/cache/conftool/dbconfig/20221205-230419-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42335 and previous config saved to /var/cache/conftool/dbconfig/20221205-230102-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42334 and previous config saved to /var/cache/conftool/dbconfig/20221205-230028-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42333 and previous config saved to /var/cache/conftool/dbconfig/20221205-225419-ladsgroup.json
  • 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42332 and previous config saved to /var/cache/conftool/dbconfig/20221205-224913-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42331 and previous config saved to /var/cache/conftool/dbconfig/20221205-224555-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42330 and previous config saved to /var/cache/conftool/dbconfig/20221205-224522-ladsgroup.json
  • 22:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42329 and previous config saved to /var/cache/conftool/dbconfig/20221205-223912-ladsgroup.json
  • 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42328 and previous config saved to /var/cache/conftool/dbconfig/20221205-223406-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42326 and previous config saved to /var/cache/conftool/dbconfig/20221205-223140-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42325 and previous config saved to /var/cache/conftool/dbconfig/20221205-223119-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42324 and previous config saved to /var/cache/conftool/dbconfig/20221205-223049-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42323 and previous config saved to /var/cache/conftool/dbconfig/20221205-223015-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42322 and previous config saved to /var/cache/conftool/dbconfig/20221205-222903-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42321 and previous config saved to /var/cache/conftool/dbconfig/20221205-222852-ladsgroup.json
  • 22:24 tzatziki: removing 1 file for legal compliance
  • 22:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42320 and previous config saved to /var/cache/conftool/dbconfig/20221205-221612-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42319 and previous config saved to /var/cache/conftool/dbconfig/20221205-221346-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42317 and previous config saved to /var/cache/conftool/dbconfig/20221205-220105-ladsgroup.json
  • 22:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
  • 21:59 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox and syncing netbox data - T296022
  • 21:58 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42316 and previous config saved to /var/cache/conftool/dbconfig/20221205-215839-ladsgroup.json
  • 21:55 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 21:55 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox - T280597
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42315 and previous config saved to /var/cache/conftool/dbconfig/20221205-215436-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42314 and previous config saved to /var/cache/conftool/dbconfig/20221205-215415-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42313 and previous config saved to /var/cache/conftool/dbconfig/20221205-214801-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42312 and previous config saved to /var/cache/conftool/dbconfig/20221205-214740-ladsgroup.json
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42311 and previous config saved to /var/cache/conftool/dbconfig/20221205-214558-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42310 and previous config saved to /var/cache/conftool/dbconfig/20221205-214333-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42309 and previous config saved to /var/cache/conftool/dbconfig/20221205-214332-ladsgroup.json
  • 21:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42308 and previous config saved to /var/cache/conftool/dbconfig/20221205-214255-ladsgroup.json
  • 21:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42307 and previous config saved to /var/cache/conftool/dbconfig/20221205-214120-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42306 and previous config saved to /var/cache/conftool/dbconfig/20221205-214058-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42305 and previous config saved to /var/cache/conftool/dbconfig/20221205-213908-ladsgroup.json
  • 21:33 TheresNoTime: close UTC late backport window
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42304 and previous config saved to /var/cache/conftool/dbconfig/20221205-213233-ladsgroup.json
  • 21:31 samtar@deploy1002: Finished scap: Backport for Adjust to changes to redlink behavior from parsoid (T324352) (duration: 09m 05s)
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42303 and previous config saved to /var/cache/conftool/dbconfig/20221205-212748-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42302 and previous config saved to /var/cache/conftool/dbconfig/20221205-212552-ladsgroup.json
  • 21:24 samtar@deploy1002: samtar and matmarex: Backport for Adjust to changes to redlink behavior from parsoid (T324352) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42301 and previous config saved to /var/cache/conftool/dbconfig/20221205-212402-ladsgroup.json
  • 21:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:22 samtar@deploy1002: Started scap: Backport for Adjust to changes to redlink behavior from parsoid (T324352)
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42300 and previous config saved to /var/cache/conftool/dbconfig/20221205-211727-ladsgroup.json
  • 21:17 samtar@deploy1002: Finished scap: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714) (duration: 09m 55s)
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42299 and previous config saved to /var/cache/conftool/dbconfig/20221205-211405-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42298 and previous config saved to /var/cache/conftool/dbconfig/20221205-211242-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42297 and previous config saved to /var/cache/conftool/dbconfig/20221205-211045-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42296 and previous config saved to /var/cache/conftool/dbconfig/20221205-210855-ladsgroup.json
  • 21:08 samtar@deploy1002: samtar and matmarex: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:07 samtar@deploy1002: Started scap: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714)
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42295 and previous config saved to /var/cache/conftool/dbconfig/20221205-210220-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42294 and previous config saved to /var/cache/conftool/dbconfig/20221205-205859-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42293 and previous config saved to /var/cache/conftool/dbconfig/20221205-205735-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42292 and previous config saved to /var/cache/conftool/dbconfig/20221205-205610-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42291 and previous config saved to /var/cache/conftool/dbconfig/20221205-205547-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42290 and previous config saved to /var/cache/conftool/dbconfig/20221205-205537-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42289 and previous config saved to /var/cache/conftool/dbconfig/20221205-205324-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42288 and previous config saved to /var/cache/conftool/dbconfig/20221205-205303-ladsgroup.json
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts phab1001.eqiad.wmnet
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 20:44 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42287 and previous config saved to /var/cache/conftool/dbconfig/20221205-204352-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42286 and previous config saved to /var/cache/conftool/dbconfig/20221205-204034-ladsgroup.json
  • 20:38 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42285 and previous config saved to /var/cache/conftool/dbconfig/20221205-203756-ladsgroup.json
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42284 and previous config saved to /var/cache/conftool/dbconfig/20221205-202846-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42283 and previous config saved to /var/cache/conftool/dbconfig/20221205-202528-ladsgroup.json
  • 20:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts phab1001.eqiad.wmnet
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42282 and previous config saved to /var/cache/conftool/dbconfig/20221205-202250-ladsgroup.json
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42281 and previous config saved to /var/cache/conftool/dbconfig/20221205-202029-ladsgroup.json
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42280 and previous config saved to /var/cache/conftool/dbconfig/20221205-202008-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42279 and previous config saved to /var/cache/conftool/dbconfig/20221205-201831-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42278 and previous config saved to /var/cache/conftool/dbconfig/20221205-201810-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42277 and previous config saved to /var/cache/conftool/dbconfig/20221205-201021-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42276 and previous config saved to /var/cache/conftool/dbconfig/20221205-200755-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42275 and previous config saved to /var/cache/conftool/dbconfig/20221205-200743-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42274 and previous config saved to /var/cache/conftool/dbconfig/20221205-200530-ladsgroup.json
  • 20:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42273 and previous config saved to /var/cache/conftool/dbconfig/20221205-200501-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42272 and previous config saved to /var/cache/conftool/dbconfig/20221205-200303-ladsgroup.json
  • 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42271 and previous config saved to /var/cache/conftool/dbconfig/20221205-195842-ladsgroup.json
  • 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:57 mutante: phab1004 (prod) - removing phab1001 from firewall rules, rsync config | phab1001 (formerly prod) - removing prod role T323418 T280597
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42270 and previous config saved to /var/cache/conftool/dbconfig/20221205-194955-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42269 and previous config saved to /var/cache/conftool/dbconfig/20221205-194757-ladsgroup.json
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42268 and previous config saved to /var/cache/conftool/dbconfig/20221205-193949-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42267 and previous config saved to /var/cache/conftool/dbconfig/20221205-193448-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42266 and previous config saved to /var/cache/conftool/dbconfig/20221205-193250-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42265 and previous config saved to /var/cache/conftool/dbconfig/20221205-193203-ladsgroup.json
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42264 and previous config saved to /var/cache/conftool/dbconfig/20221205-192442-ladsgroup.json
  • 19:24 mutante: phab1001, previous long time phabricator host, is about to be shut down, made a final copy of /srv/deployment, /root, /home, /etc and synced it to phab1004 - T323418
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42263 and previous config saved to /var/cache/conftool/dbconfig/20221205-191656-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42262 and previous config saved to /var/cache/conftool/dbconfig/20221205-190935-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42261 and previous config saved to /var/cache/conftool/dbconfig/20221205-190710-ladsgroup.json
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42260 and previous config saved to /var/cache/conftool/dbconfig/20221205-190150-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42259 and previous config saved to /var/cache/conftool/dbconfig/20221205-185429-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42258 and previous config saved to /var/cache/conftool/dbconfig/20221205-185205-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42257 and previous config saved to /var/cache/conftool/dbconfig/20221205-184950-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42256 and previous config saved to /var/cache/conftool/dbconfig/20221205-184944-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42255 and previous config saved to /var/cache/conftool/dbconfig/20221205-184643-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42254 and previous config saved to /var/cache/conftool/dbconfig/20221205-183851-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42253 and previous config saved to /var/cache/conftool/dbconfig/20221205-183712-ladsgroup.json
  • 18:37 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1033.eqiad.wmnet with OS bullseye
  • 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42252 and previous config saved to /var/cache/conftool/dbconfig/20221205-183700-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42251 and previous config saved to /var/cache/conftool/dbconfig/20221205-182155-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5016.eqsin.wmnet
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:39 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:34 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5016.eqsin.wmnet
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=varnish-fe
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-be
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-be
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5024.eqsin.wmnet,service=ats-be
  • 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1034.eqiad.wmnet with OS bullseye
  • 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1035.eqiad.wmnet with OS bullseye
  • 17:02 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 16:59 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 16:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5015.eqsin.wmnet
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:56 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 16:56 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 16:53 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 16:53 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5015.eqsin.wmnet
  • 16:44 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1033.eqiad.wmnet with OS bullseye
  • 16:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
  • 16:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
  • 16:41 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1034.eqiad.wmnet with OS bullseye
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=varnish-fe
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-be
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-tls
  • 16:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1010.eqiad.wmnet with OS bullseye
  • 16:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1035.eqiad.wmnet with OS bullseye
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5027.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5023.eqsin.wmnet,service=ats-be
  • 16:27 klausman: restarted kube-apiserver on ml-staging-ctrl2001 to adress high latency
  • 16:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
  • 16:11 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
  • 16:06 klausman: restarted kube-apiserver on ml-serve-ctrl1001 to adress high latency and large number of 504s
  • 16:06 moritzm: installing glibc security updates on buster
  • 15:46 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1010.eqiad.wmnet with OS bullseye
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5012,5014].eqsin.wmnet
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:36 moritzm: installing apache2 security updates on buster
  • 15:35 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5012,5014].eqsin.wmnet
  • 15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
  • 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=varnish-fe
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-be
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-tls
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=varnish-fe
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-be
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5026.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5022.eqsin.wmnet,service=ats-be
  • 15:14 andrewbogott: deleted wikitech-static-ord-prebuster image backup in rackspace cloud. Here concludes the wikitech-static upgrade to Buster and php7.4
  • 15:07 root@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:06 root@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:06 root@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:05 root@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5011,5013].eqsin.wmnet
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:54 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5011,5013].eqsin.wmnet
  • 14:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=varnish-fe
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-be
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-tls
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=varnish-fe
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-be
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-tls
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:37 TheresNoTime: closing UTC afternoon backport window
  • 14:36 samtar@deploy1002: Finished scap: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393) (duration: 08m 37s)
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5025.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 14:29 samtar@deploy1002: samtar and stang: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42249 and previous config saved to /var/cache/conftool/dbconfig/20221205-142752-marostegui.json
  • 14:27 samtar@deploy1002: Started scap: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393)
  • 14:26 samtar@deploy1002: Finished scap: Backport for Add Property (120) to Wikidata content Namespace (T321282) (duration: 16m 59s)
  • 14:18 samtar@deploy1002: samtar and gtzatchkova: Backport for Add Property (120) to Wikidata content Namespace (T321282) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:09 samtar@deploy1002: Started scap: Backport for Add Property (120) to Wikidata content Namespace (T321282)
  • 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2127 T324180', diff saved to https://phabricator.wikimedia.org/P42247 and previous config saved to /var/cache/conftool/dbconfig/20221205-135932-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 primary T324180', diff saved to https://phabricator.wikimedia.org/P42246 and previous config saved to /var/cache/conftool/dbconfig/20221205-135539-ladsgroup.json
  • 13:55 Amir1: Starting s3 codfw failover from db2127 to db2105 - T324180
  • 13:51 dcausse: repooling wdqs1004
  • 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55818
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2105 with weight 0 T324180', diff saved to https://phabricator.wikimedia.org/P42245 and previous config saved to /var/cache/conftool/dbconfig/20221205-134346-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T324180
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T324180
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55818
  • 13:31 TheresNoTime: T302486 : [samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --ns 828 --delete
  • 13:24 moritzm: installing postgresql-common bugfix updates from Buster 10.13 point release
  • 13:17 moritzm: installing distro-info-data bugfix updates from Buster 10.13 point release
  • 13:12 moritzm: installing libnet-ssleay-perl bugfix updates from Buster 10.13 point release
  • 12:50 moritzm: installing python-keystoneauth1 bugfix updates from Buster 10.13 point release
  • 12:41 root@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:41 root@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 12:41 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:39 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:51 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:50 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42243 and previous config saved to /var/cache/conftool/dbconfig/20221205-113746-marostegui.json
  • 11:31 moritzm: installing librsvg bugfix updates from buster point release
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42242 and previous config saved to /var/cache/conftool/dbconfig/20221205-111836-marostegui.json
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:07 hashar: Restarted Zuul to clear a stuck ssh connection with Gerrit - T309376
  • 10:33 kostajh: UTC morning deploys done
  • 10:32 godog: contint1001 - racadm serveraction powercyle - crashed
  • 10:31 kharlan@deploy1002: Finished scap: Backport for User impact: Show discovery notice to mobile users (T323619) (duration: 09m 30s)
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42241 and previous config saved to /var/cache/conftool/dbconfig/20221205-103028-marostegui.json
  • 10:23 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Show discovery notice to mobile users (T323619) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:22 kharlan@deploy1002: Started scap: Backport for User impact: Show discovery notice to mobile users (T323619)
  • 10:14 Emperor: rebalance thanos rings T311690
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42240 and previous config saved to /var/cache/conftool/dbconfig/20221205-100607-marostegui.json
  • 10:05 kharlan@deploy1002: Finished scap: Backport for User impact: Show discovery tour to desktop users who had old module (T323619) (duration: 27m 33s)
  • 09:50 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Show discovery tour to desktop users who had old module (T323619) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 09:39 moritzm: restarting mediawiki canaries to pick up freetype security updates
  • 09:38 godog: force a puppet run on physical hosts to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/860572
  • 09:37 kharlan@deploy1002: Started scap: Backport for User impact: Show discovery tour to desktop users who had old module (T323619)
  • 09:36 moritzm: installing freetype security updates
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42239 and previous config saved to /var/cache/conftool/dbconfig/20221205-091547-marostegui.json
  • 09:15 kharlan@deploy1002: backport aborted: (duration: 00m 25s)
  • 09:14 kharlan@deploy1002: Finished scap: Backport for Fix ExpensiveUserImpact input validation (T324312) (duration: 09m 10s)
  • 09:06 kharlan@deploy1002: kharlan and kharlan: Backport for Fix ExpensiveUserImpact input validation (T324312) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 09:05 kharlan@deploy1002: Started scap: Backport for Fix ExpensiveUserImpact input validation (T324312)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42238 and previous config saved to /var/cache/conftool/dbconfig/20221205-090214-marostegui.json
  • 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59689
  • 09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59689
  • 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58308
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58308
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141731
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141731
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52580
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52580
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42237 and previous config saved to /var/cache/conftool/dbconfig/20221205-085235-marostegui.json
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136907
  • 08:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136907
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55818
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55818
  • 08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38623
  • 08:48 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: End imagerecommendation experiment (T323686) (duration: 09m 26s)
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38623
  • 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4788
  • 08:40 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: End imagerecommendation experiment (T323686) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:38 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: End imagerecommendation experiment (T323686)
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4788
  • 08:35 kartik@deploy1002: Finished scap: Backport for Enable Section Translation on 8 Wikipedias (T319176) (duration: 09m 57s)
  • 08:29 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web,name=eqiad
  • 08:27 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation on 8 Wikipedias (T319176) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:25 kartik@deploy1002: Started scap: Backport for Enable Section Translation on 8 Wikipedias (T319176)
  • 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42236 and previous config saved to /var/cache/conftool/dbconfig/20221205-082320-marostegui.json
  • 08:22 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-web,name=eqiad
  • 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2002.codfw.wmnet,service=thanos-web
  • 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2003.codfw.wmnet,service=thanos-web
  • 08:20 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177) (duration: 17m 25s)
  • 08:11 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:05 dcausse: restarting blazegraph on wdqs1004 (stuck with 2000+ threads, T242453)
  • 08:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177)
  • 07:57 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 07:56 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1003.eqiad.wmnet,service=thanos-web
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P42234 and previous config saved to /var/cache/conftool/dbconfig/20221205-074804-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 100%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42233 and previous config saved to /var/cache/conftool/dbconfig/20221205-074655-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P42232 and previous config saved to /var/cache/conftool/dbconfig/20221205-073259-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 75%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42231 and previous config saved to /var/cache/conftool/dbconfig/20221205-073150-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P42230 and previous config saved to /var/cache/conftool/dbconfig/20221205-071754-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 50%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42229 and previous config saved to /var/cache/conftool/dbconfig/20221205-071645-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P42228 and previous config saved to /var/cache/conftool/dbconfig/20221205-070250-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 25%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42227 and previous config saved to /var/cache/conftool/dbconfig/20221205-070140-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42226 and previous config saved to /var/cache/conftool/dbconfig/20221205-065151-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P42225 and previous config saved to /var/cache/conftool/dbconfig/20221205-064745-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 10%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42224 and previous config saved to /var/cache/conftool/dbconfig/20221205-064635-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42223 and previous config saved to /var/cache/conftool/dbconfig/20221205-063743-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P42222 and previous config saved to /var/cache/conftool/dbconfig/20221205-063240-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 5%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42221 and previous config saved to /var/cache/conftool/dbconfig/20221205-063130-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to dbctl (depooled)', diff saved to https://phabricator.wikimedia.org/P42220 and previous config saved to /var/cache/conftool/dbconfig/20221205-063020-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P42219 and previous config saved to /var/cache/conftool/dbconfig/20221205-061735-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 1%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42218 and previous config saved to /var/cache/conftool/dbconfig/20221205-061625-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P42217 and previous config saved to /var/cache/conftool/dbconfig/20221205-061616-marostegui.json

2022-12-04

  • 04:19 TheresNoTime: T302486 : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`

2022-12-03

  • 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - T321410

2022-12-02

  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:36 volans: fixed git checkout permissions T324334
  • 19:11 sukhe: restart pybal on lvs5004
  • 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
  • 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:20 sukhe: decomm lvs5001: restarting pybal
  • 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - T324334
  • 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
  • 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
  • 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
  • 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
  • 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
  • 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - T324334
  • 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 12:09 jynus: dropping all databases from db1133
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
  • 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - T321309
  • 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
  • 09:54 moritzm: installing debootstrap updates from bullseye point release
  • 09:53 moritzm: rebalance ganeti codfw/C T323222
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
  • 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
  • 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
  • 07:41 moritzm: draining ganeti5001 for eventual decom T322048
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45|46).eqiad.wmnet,cluster=jobrunner
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39|40).eqiad.wmnet,cluster=videoscaler
  • 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster

2022-12-01

  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
  • 22:57 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue (duration: 07m 28s)
  • 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet # T306162
  • 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet # T306162
  • 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:50 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue
  • 22:49 urbanecm@deploy1002: backport aborted: (duration: 00m 03s)
  • 22:42 andrewbogott: upgradedwikitech-static-ord (aka wikitech-static) to Debian Buster, installed php7.4, upgraded MW to 1_39. Will delete the rackspace backup image in a few days.
  • 22:19 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:07 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 cwhite: restart swift-proxy on thanos::frontend eqiad
  • 22:01 brennen: end of utc late backport & config window
  • 21:46 brennen@deploy1002: Finished scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) (duration: 07m 48s)
  • 21:40 brennen@deploy1002: brennen and kharlan: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:38 brennen@deploy1002: Started scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)
  • 21:34 brennen@deploy1002: Finished scap: Backport for New configs for android schemas (duration: 09m 49s)
  • 21:26 brennen@deploy1002: brennen and sharvaniharan: Backport for New configs for android schemas synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:25 andrewbogott: saving an image of wikitech-static-ord (aka wikitech-static) before upgrading the host to Buster
  • 21:25 brennen@deploy1002: Started scap: Backport for New configs for android schemas
  • 21:22 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:21 brennen@deploy1002: Finished scap: Backport for Start writing to cul_actor on test wikis (T233004) (duration: 14m 56s)
  • 21:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw[1307-1326].eqiad.wmnet
  • 21:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:08 brennen@deploy1002: brennen and zabe: Backport for Start writing to cul_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Start writing to cul_actor on test wikis (T233004)
  • 20:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab1004.wikimedia.org
  • 20:47 aokoth@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab1004.wikimedia.org
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 20:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 19:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 19:44 mutante: gitlab-runner1002 - upgrading gitlab-runner package
  • 19:44 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 19:43 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42201 and previous config saved to /var/cache/conftool/dbconfig/20221201-194301-ladsgroup.json
  • 19:42 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:41 mutante: gitlab2002 (gitlab-replica) - upgrading gitlab-ce
  • 19:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 19:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns5004.wikimedia.org with OS buster
  • 19:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 dancy@deploy1002: Finished scap: testing k8s deployment (duration: 06m 17s)
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42200 and previous config saved to /var/cache/conftool/dbconfig/20221201-192755-ladsgroup.json
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 19:21 dancy@deploy1002: Started scap: testing k8s deployment
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.12 refs T320517
  • 19:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42199 and previous config saved to /var/cache/conftool/dbconfig/20221201-191248-ladsgroup.json
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 dancy@deploy1002: Installation of scap version "4.30.0" completed for 601 hosts
  • 19:01 dancy@deploy1002: Installing scap version "4.30.0" for 601 hosts
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42197 and previous config saved to /var/cache/conftool/dbconfig/20221201-185742-ladsgroup.json
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 18:37 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 18:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 18:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 18:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 18:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42196 and previous config saved to /var/cache/conftool/dbconfig/20221201-181215-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42195 and previous config saved to /var/cache/conftool/dbconfig/20221201-181153-ladsgroup.json
  • 18:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1060']
  • 18:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 18:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42194 and previous config saved to /var/cache/conftool/dbconfig/20221201-175647-ladsgroup.json
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42193 and previous config saved to /var/cache/conftool/dbconfig/20221201-174140-ladsgroup.json
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 17:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 17:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1057']
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42192 and previous config saved to /var/cache/conftool/dbconfig/20221201-172634-ladsgroup.json
  • 17:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42191 and previous config saved to /var/cache/conftool/dbconfig/20221201-171335-ladsgroup.json
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:01 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42190 and previous config saved to /var/cache/conftool/dbconfig/20221201-165828-ladsgroup.json
  • 16:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:55 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:50 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5004
  • 16:50 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5004
  • 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:48 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:46 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42189 and previous config saved to /var/cache/conftool/dbconfig/20221201-164509-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42188 and previous config saved to /var/cache/conftool/dbconfig/20221201-164437-ladsgroup.json
  • 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:43 moritzm: installing ini4j security updates
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42187 and previous config saved to /var/cache/conftool/dbconfig/20221201-164322-ladsgroup.json
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42185 and previous config saved to /var/cache/conftool/dbconfig/20221201-162930-ladsgroup.json
  • 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42184 and previous config saved to /var/cache/conftool/dbconfig/20221201-162815-ladsgroup.json
  • 16:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42183 and previous config saved to /var/cache/conftool/dbconfig/20221201-161424-ladsgroup.json
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 16:00 effie: php7.4 upgrade + apache upgrade + rolling restarts of parsoid servers - T323358
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42182 and previous config saved to /var/cache/conftool/dbconfig/20221201-155917-ladsgroup.json
  • 15:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 15:57 effie: php7.4 upgrade + apache upgrade + rolling restarts of jobrunners/videoscalers servers - T323358
  • 15:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 15:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:41 effie: php7.4 upgrade + apache upgrade + rolling restarts of api servers - T323358
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42181 and previous config saved to /var/cache/conftool/dbconfig/20221201-153918-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42180 and previous config saved to /var/cache/conftool/dbconfig/20221201-153856-ladsgroup.json
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5001.wikimedia.org
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:34 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5001.wikimedia.org
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42179 and previous config saved to /var/cache/conftool/dbconfig/20221201-152350-ladsgroup.json
  • 15:12 effie: php7.4 upgrade + apache upgrade + rolling restarts of app servers - T323358
  • 15:11 sukhe: [done] homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:10 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42178 and previous config saved to /var/cache/conftool/dbconfig/20221201-150843-ladsgroup.json
  • 15:01 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185) (duration: 08m 06s)
  • 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and soda: Backport for Enable limited width on plwikisource MAIN namespace (T323185) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42177 and previous config saved to /var/cache/conftool/dbconfig/20221201-145337-ladsgroup.json
  • 14:52 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185)
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:50 moritzm: installing krb5 security updates
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:45 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) (duration: 06m 12s)
  • 14:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:42 XioNoX: add BGP sessions to RIPE RIS in drmrs
  • 14:40 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:39 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526)
  • 14:36 kharlan@deploy1002: Finished scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) (duration: 06m 04s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:31 kharlan@deploy1002: kharlan and tgr: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:30 kharlan@deploy1002: Started scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854)
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:27 kharlan@deploy1002: Finished scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) (duration: 07m 25s)
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42176 and previous config saved to /var/cache/conftool/dbconfig/20221201-142735-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:21 kharlan@deploy1002: kharlan and kharlan: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:20 kharlan@deploy1002: Started scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42175 and previous config saved to /var/cache/conftool/dbconfig/20221201-132000-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42174 and previous config saved to /var/cache/conftool/dbconfig/20221201-131950-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42172 and previous config saved to /var/cache/conftool/dbconfig/20221201-130443-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42171 and previous config saved to /var/cache/conftool/dbconfig/20221201-125821-ladsgroup.json
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42170 and previous config saved to /var/cache/conftool/dbconfig/20221201-124936-ladsgroup.json
  • 12:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 12:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:43 moritzm: installing glibc security updates on buster
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42169 and previous config saved to /var/cache/conftool/dbconfig/20221201-124314-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42168 and previous config saved to /var/cache/conftool/dbconfig/20221201-123430-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42167 and previous config saved to /var/cache/conftool/dbconfig/20221201-122807-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42166 and previous config saved to /var/cache/conftool/dbconfig/20221201-121301-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42165 and previous config saved to /var/cache/conftool/dbconfig/20221201-120102-ladsgroup.json
  • 11:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42164 and previous config saved to /var/cache/conftool/dbconfig/20221201-114555-ladsgroup.json
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42163 and previous config saved to /var/cache/conftool/dbconfig/20221201-113049-ladsgroup.json
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:18 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) (duration: 06m 56s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42162 and previous config saved to /var/cache/conftool/dbconfig/20221201-111542-ladsgroup.json
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:12 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148)
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42161 and previous config saved to /var/cache/conftool/dbconfig/20221201-110938-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42160 and previous config saved to /var/cache/conftool/dbconfig/20221201-110916-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42159 and previous config saved to /var/cache/conftool/dbconfig/20221201-105938-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42158 and previous config saved to /var/cache/conftool/dbconfig/20221201-105916-ladsgroup.json
  • 10:57 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web
  • 10:56 elukey: deleted knative controller + net-istio controllers on ml-serve-eqiad to clear out some weird state (causing high latencies for the k8s api)
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42157 and previous config saved to /var/cache/conftool/dbconfig/20221201-105410-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42156 and previous config saved to /var/cache/conftool/dbconfig/20221201-104409-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42155 and previous config saved to /var/cache/conftool/dbconfig/20221201-103903-ladsgroup.json
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42154 and previous config saved to /var/cache/conftool/dbconfig/20221201-103448-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42153 and previous config saved to /var/cache/conftool/dbconfig/20221201-103426-ladsgroup.json
  • 10:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42152 and previous config saved to /var/cache/conftool/dbconfig/20221201-102903-ladsgroup.json
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42151 and previous config saved to /var/cache/conftool/dbconfig/20221201-102357-ladsgroup.json
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42150 and previous config saved to /var/cache/conftool/dbconfig/20221201-101920-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42149 and previous config saved to /var/cache/conftool/dbconfig/20221201-101754-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42148 and previous config saved to /var/cache/conftool/dbconfig/20221201-101733-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42147 and previous config saved to /var/cache/conftool/dbconfig/20221201-101356-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42146 and previous config saved to /var/cache/conftool/dbconfig/20221201-100413-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42145 and previous config saved to /var/cache/conftool/dbconfig/20221201-100227-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42144 and previous config saved to /var/cache/conftool/dbconfig/20221201-094907-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42143 and previous config saved to /var/cache/conftool/dbconfig/20221201-094720-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42142 and previous config saved to /var/cache/conftool/dbconfig/20221201-093214-ladsgroup.json
  • 09:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42141 and previous config saved to /var/cache/conftool/dbconfig/20221201-092455-ladsgroup.json
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42140 and previous config saved to /var/cache/conftool/dbconfig/20221201-092434-ladsgroup.json
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:19 kostajh: UTC morning deploys done
  • 09:18 kharlan@deploy1002: Finished scap: Backport for User impact: Fix per-page pageview numbers (T323253) (duration: 08m 31s)
  • 09:15 Emperor: depool, restart, repool swift-proxy on ms-fe1011
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Fix per-page pageview numbers (T323253) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:09 kharlan@deploy1002: Started scap: Backport for User impact: Fix per-page pageview numbers (T323253)
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42139 and previous config saved to /var/cache/conftool/dbconfig/20221201-090927-ladsgroup.json
  • 09:07 moritzm: rebuilding raid on ganeti2013 T323222
  • 09:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2013.codfw.wmnet
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42138 and previous config saved to /var/cache/conftool/dbconfig/20221201-085421-ladsgroup.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:49 volans: restart idrac on mw1334, ipmi and remote ipmi works fine, ssh not responding
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42137 and previous config saved to /var/cache/conftool/dbconfig/20221201-084147-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42136 and previous config saved to /var/cache/conftool/dbconfig/20221201-084125-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42135 and previous config saved to /var/cache/conftool/dbconfig/20221201-084026-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42134 and previous config saved to /var/cache/conftool/dbconfig/20221201-083914-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42131 and previous config saved to /var/cache/conftool/dbconfig/20221201-082619-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42130 and previous config saved to /var/cache/conftool/dbconfig/20221201-082519-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42129 and previous config saved to /var/cache/conftool/dbconfig/20221201-082215-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42128 and previous config saved to /var/cache/conftool/dbconfig/20221201-082154-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42127 and previous config saved to /var/cache/conftool/dbconfig/20221201-081444-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42126 and previous config saved to /var/cache/conftool/dbconfig/20221201-081433-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42125 and previous config saved to /var/cache/conftool/dbconfig/20221201-081112-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42124 and previous config saved to /var/cache/conftool/dbconfig/20221201-081013-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42123 and previous config saved to /var/cache/conftool/dbconfig/20221201-080647-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42122 and previous config saved to /var/cache/conftool/dbconfig/20221201-075927-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42120 and previous config saved to /var/cache/conftool/dbconfig/20221201-075606-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42119 and previous config saved to /var/cache/conftool/dbconfig/20221201-075506-ladsgroup.json
  • 07:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 400474
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42118 and previous config saved to /var/cache/conftool/dbconfig/20221201-075140-ladsgroup.json
  • 07:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 400474
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42117 and previous config saved to /var/cache/conftool/dbconfig/20221201-074420-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42116 and previous config saved to /var/cache/conftool/dbconfig/20221201-073634-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42115 and previous config saved to /var/cache/conftool/dbconfig/20221201-073015-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42114 and previous config saved to /var/cache/conftool/dbconfig/20221201-072914-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42113 and previous config saved to /var/cache/conftool/dbconfig/20221201-072659-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42111 and previous config saved to /var/cache/conftool/dbconfig/20221201-071641-ladsgroup.json
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42110 and previous config saved to /var/cache/conftool/dbconfig/20221201-071615-ladsgroup.json
  • 07:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42109 and previous config saved to /var/cache/conftool/dbconfig/20221201-071153-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T323547', diff saved to https://phabricator.wikimedia.org/P42108 and previous config saved to /var/cache/conftool/dbconfig/20221201-070758-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T323547', diff saved to https://phabricator.wikimedia.org/P42107 and previous config saved to /var/cache/conftool/dbconfig/20221201-070203-ladsgroup.json
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T323547', diff saved to https://phabricator.wikimedia.org/P42106 and previous config saved to /var/cache/conftool/dbconfig/20221201-070131-ladsgroup.json
  • 07:01 Amir1: Starting s1 eqiad failover from db1163 to db1118 - T323547
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42105 and previous config saved to /var/cache/conftool/dbconfig/20221201-070108-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42104 and previous config saved to /var/cache/conftool/dbconfig/20221201-065737-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42103 and previous config saved to /var/cache/conftool/dbconfig/20221201-065646-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42102 and previous config saved to /var/cache/conftool/dbconfig/20221201-064602-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42101 and previous config saved to /var/cache/conftool/dbconfig/20221201-064230-ladsgroup.json
  • 06:42 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 06:42 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42100 and previous config saved to /var/cache/conftool/dbconfig/20221201-064140-ladsgroup.json
  • 06:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42099 and previous config saved to /var/cache/conftool/dbconfig/20221201-063930-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42098 and previous config saved to /var/cache/conftool/dbconfig/20221201-063908-ladsgroup.json
  • 06:36 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42097 and previous config saved to /var/cache/conftool/dbconfig/20221201-063055-ladsgroup.json
  • 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42096 and previous config saved to /var/cache/conftool/dbconfig/20221201-062724-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42095 and previous config saved to /var/cache/conftool/dbconfig/20221201-062402-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42094 and previous config saved to /var/cache/conftool/dbconfig/20221201-061218-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42093 and previous config saved to /var/cache/conftool/dbconfig/20221201-060855-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42092 and previous config saved to /var/cache/conftool/dbconfig/20221201-060230-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42091 and previous config saved to /var/cache/conftool/dbconfig/20221201-060206-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T323547', diff saved to https://phabricator.wikimedia.org/P42090 and previous config saved to /var/cache/conftool/dbconfig/20221201-060157-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42089 and previous config saved to /var/cache/conftool/dbconfig/20221201-055359-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42088 and previous config saved to /var/cache/conftool/dbconfig/20221201-055349-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42087 and previous config saved to /var/cache/conftool/dbconfig/20221201-055337-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42086 and previous config saved to /var/cache/conftool/dbconfig/20221201-055239-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42085 and previous config saved to /var/cache/conftool/dbconfig/20221201-055218-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42084 and previous config saved to /var/cache/conftool/dbconfig/20221201-055142-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42083 and previous config saved to /var/cache/conftool/dbconfig/20221201-055120-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42082 and previous config saved to /var/cache/conftool/dbconfig/20221201-054653-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42081 and previous config saved to /var/cache/conftool/dbconfig/20221201-053831-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42080 and previous config saved to /var/cache/conftool/dbconfig/20221201-053711-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42079 and previous config saved to /var/cache/conftool/dbconfig/20221201-053613-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42078 and previous config saved to /var/cache/conftool/dbconfig/20221201-053147-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42077 and previous config saved to /var/cache/conftool/dbconfig/20221201-052524-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42076 and previous config saved to /var/cache/conftool/dbconfig/20221201-052325-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42075 and previous config saved to /var/cache/conftool/dbconfig/20221201-052223-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42074 and previous config saved to /var/cache/conftool/dbconfig/20221201-052205-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42073 and previous config saved to /var/cache/conftool/dbconfig/20221201-052107-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42072 and previous config saved to /var/cache/conftool/dbconfig/20221201-052014-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42071 and previous config saved to /var/cache/conftool/dbconfig/20221201-051942-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42070 and previous config saved to /var/cache/conftool/dbconfig/20221201-051640-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42069 and previous config saved to /var/cache/conftool/dbconfig/20221201-050818-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42068 and previous config saved to /var/cache/conftool/dbconfig/20221201-050658-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42067 and previous config saved to /var/cache/conftool/dbconfig/20221201-050600-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42066 and previous config saved to /var/cache/conftool/dbconfig/20221201-050548-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42065 and previous config saved to /var/cache/conftool/dbconfig/20221201-050527-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42064 and previous config saved to /var/cache/conftool/dbconfig/20221201-050435-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42063 and previous config saved to /var/cache/conftool/dbconfig/20221201-045020-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42062 and previous config saved to /var/cache/conftool/dbconfig/20221201-044929-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42061 and previous config saved to /var/cache/conftool/dbconfig/20221201-044053-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42060 and previous config saved to /var/cache/conftool/dbconfig/20221201-044031-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42059 and previous config saved to /var/cache/conftool/dbconfig/20221201-043514-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42058 and previous config saved to /var/cache/conftool/dbconfig/20221201-043422-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42057 and previous config saved to /var/cache/conftool/dbconfig/20221201-043315-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42056 and previous config saved to /var/cache/conftool/dbconfig/20221201-043253-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42055 and previous config saved to /var/cache/conftool/dbconfig/20221201-042525-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42054 and previous config saved to /var/cache/conftool/dbconfig/20221201-042251-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42053 and previous config saved to /var/cache/conftool/dbconfig/20221201-042229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42052 and previous config saved to /var/cache/conftool/dbconfig/20221201-042008-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42051 and previous config saved to /var/cache/conftool/dbconfig/20221201-041758-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42050 and previous config saved to /var/cache/conftool/dbconfig/20221201-041747-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42049 and previous config saved to /var/cache/conftool/dbconfig/20221201-041652-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42048 and previous config saved to /var/cache/conftool/dbconfig/20221201-041322-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42047 and previous config saved to /var/cache/conftool/dbconfig/20221201-041018-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42046 and previous config saved to /var/cache/conftool/dbconfig/20221201-040723-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42045 and previous config saved to /var/cache/conftool/dbconfig/20221201-040240-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42044 and previous config saved to /var/cache/conftool/dbconfig/20221201-040145-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42043 and previous config saved to /var/cache/conftool/dbconfig/20221201-035816-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42042 and previous config saved to /var/cache/conftool/dbconfig/20221201-035512-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42041 and previous config saved to /var/cache/conftool/dbconfig/20221201-035216-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42040 and previous config saved to /var/cache/conftool/dbconfig/20221201-034734-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42039 and previous config saved to /var/cache/conftool/dbconfig/20221201-034639-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42038 and previous config saved to /var/cache/conftool/dbconfig/20221201-034627-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42037 and previous config saved to /var/cache/conftool/dbconfig/20221201-034527-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42036 and previous config saved to /var/cache/conftool/dbconfig/20221201-034309-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42035 and previous config saved to /var/cache/conftool/dbconfig/20221201-033710-ladsgroup.json
  • 03:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS buster
  • 03:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42034 and previous config saved to /var/cache/conftool/dbconfig/20221201-033449-ladsgroup.json
  • 03:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42033 and previous config saved to /var/cache/conftool/dbconfig/20221201-033132-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42032 and previous config saved to /var/cache/conftool/dbconfig/20221201-033020-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42031 and previous config saved to /var/cache/conftool/dbconfig/20221201-032922-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42030 and previous config saved to /var/cache/conftool/dbconfig/20221201-032901-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42029 and previous config saved to /var/cache/conftool/dbconfig/20221201-032803-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42028 and previous config saved to /var/cache/conftool/dbconfig/20221201-032553-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42027 and previous config saved to /var/cache/conftool/dbconfig/20221201-032531-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42026 and previous config saved to /var/cache/conftool/dbconfig/20221201-031608-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42025 and previous config saved to /var/cache/conftool/dbconfig/20221201-031546-ladsgroup.json
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42024 and previous config saved to /var/cache/conftool/dbconfig/20221201-031514-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42023 and previous config saved to /var/cache/conftool/dbconfig/20221201-031354-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42022 and previous config saved to /var/cache/conftool/dbconfig/20221201-031024-ladsgroup.json
  • 03:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42021 and previous config saved to /var/cache/conftool/dbconfig/20221201-030040-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42020 and previous config saved to /var/cache/conftool/dbconfig/20221201-030007-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42019 and previous config saved to /var/cache/conftool/dbconfig/20221201-025900-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42018 and previous config saved to /var/cache/conftool/dbconfig/20221201-025848-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42017 and previous config saved to /var/cache/conftool/dbconfig/20221201-025838-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42016 and previous config saved to /var/cache/conftool/dbconfig/20221201-025517-ladsgroup.json
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42015 and previous config saved to /var/cache/conftool/dbconfig/20221201-024533-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42014 and previous config saved to /var/cache/conftool/dbconfig/20221201-024341-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42013 and previous config saved to /var/cache/conftool/dbconfig/20221201-024331-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42012 and previous config saved to /var/cache/conftool/dbconfig/20221201-024131-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P42011 and previous config saved to /var/cache/conftool/dbconfig/20221201-024110-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42010 and previous config saved to /var/cache/conftool/dbconfig/20221201-024011-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42009 and previous config saved to /var/cache/conftool/dbconfig/20221201-023801-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P42008 and previous config saved to /var/cache/conftool/dbconfig/20221201-023750-ladsgroup.json
  • 02:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42007 and previous config saved to /var/cache/conftool/dbconfig/20221201-023027-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42006 and previous config saved to /var/cache/conftool/dbconfig/20221201-022825-ladsgroup.json
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42005 and previous config saved to /var/cache/conftool/dbconfig/20221201-022603-ladsgroup.json
  • 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P42004 and previous config saved to /var/cache/conftool/dbconfig/20221201-022244-ladsgroup.json
  • 02:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42003 and previous config saved to /var/cache/conftool/dbconfig/20221201-021318-ladsgroup.json
  • 02:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42002 and previous config saved to /var/cache/conftool/dbconfig/20221201-021211-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:12 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P42001 and previous config saved to /var/cache/conftool/dbconfig/20221201-021149-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42000 and previous config saved to /var/cache/conftool/dbconfig/20221201-021057-ladsgroup.json
  • 02:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 02:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P41999 and previous config saved to /var/cache/conftool/dbconfig/20221201-020737-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P41998 and previous config saved to /var/cache/conftool/dbconfig/20221201-020308-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41997 and previous config saved to /var/cache/conftool/dbconfig/20221201-015643-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41996 and previous config saved to /var/cache/conftool/dbconfig/20221201-015550-ladsgroup.json
  • 01:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41995 and previous config saved to /var/cache/conftool/dbconfig/20221201-015340-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P41994 and previous config saved to /var/cache/conftool/dbconfig/20221201-015332-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41993 and previous config saved to /var/cache/conftool/dbconfig/20221201-015230-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P41992 and previous config saved to /var/cache/conftool/dbconfig/20221201-015115-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41991 and previous config saved to /var/cache/conftool/dbconfig/20221201-015020-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41990 and previous config saved to /var/cache/conftool/dbconfig/20221201-015010-ladsgroup.json
  • 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41989 and previous config saved to /var/cache/conftool/dbconfig/20221201-014136-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41988 and previous config saved to /var/cache/conftool/dbconfig/20221201-013503-ladsgroup.json
  • 01:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41987 and previous config saved to /var/cache/conftool/dbconfig/20221201-012630-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41986 and previous config saved to /var/cache/conftool/dbconfig/20221201-012522-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41985 and previous config saved to /var/cache/conftool/dbconfig/20221201-012500-ladsgroup.json
  • 01:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS buster
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41984 and previous config saved to /var/cache/conftool/dbconfig/20221201-011957-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41983 and previous config saved to /var/cache/conftool/dbconfig/20221201-010954-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41982 and previous config saved to /var/cache/conftool/dbconfig/20221201-010450-ladsgroup.json
  • 01:04 ejegg: payments-wiki upgraded from 96c74911 to c52a6a39
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41981 and previous config saved to /var/cache/conftool/dbconfig/20221201-010240-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41980 and previous config saved to /var/cache/conftool/dbconfig/20221201-010219-ladsgroup.json
  • 00:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41979 and previous config saved to /var/cache/conftool/dbconfig/20221201-005447-ladsgroup.json
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41978 and previous config saved to /var/cache/conftool/dbconfig/20221201-004712-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41977 and previous config saved to /var/cache/conftool/dbconfig/20221201-003941-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41976 and previous config saved to /var/cache/conftool/dbconfig/20221201-003533-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T322618)', diff saved to https://phabricator.wikimedia.org/P41975 and previous config saved to /var/cache/conftool/dbconfig/20221201-003511-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41974 and previous config saved to /var/cache/conftool/dbconfig/20221201-003205-ladsgroup.json
  • 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS buster
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1206.eqiad.wmnet with OS bullseye
  • 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41973 and previous config saved to /var/cache/conftool/dbconfig/20221201-002005-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41972 and previous config saved to /var/cache/conftool/dbconfig/20221201-001659-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41971 and previous config saved to /var/cache/conftool/dbconfig/20221201-001449-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T322618)', diff saved to https://phabricator.wikimedia.org/P41970 and previous config saved to /var/cache/conftool/dbconfig/20221201-001427-ladsgroup.json
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41969 and previous config saved to /var/cache/conftool/dbconfig/20221201-000458-ladsgroup.json

Other archives

2000s

2010s

2020s