20:54 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:28 urbanecm@deploy1002: arlolra and urbanecm: Backport for Fix description link icon positioning (T329364) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:28 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04c11e6]
20:07 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
20:05 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Enables ab test for multiple languages (T336969) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
17:18 ladsgroup@deploy1002: ladsgroup: Backport for Remove legacy encoding option from dawiktionary (T128155) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
17:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2037,2039,2041].codfw.wmnet} and A:cp
16:59 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
15:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_codfw
14:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T329049)
13:39 ottomata: destroy mw-page-content-change-enrich deployment in dse-k8s-eqiad in order to deploy in wikikube - T330507
13:35 godog: rm cadvisor.service symlink/alias and restart kubelet on affected hosts - T337836
13:33 urbanecm@deploy1002: mdsshakil and urbanecm: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
12:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
12:03 jayme: re-enabling puppet on all kubernetes hosts
11:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
11:20 apergos: rebooted dumpsdata1006 manually after seeral timeouts trying to use the cookbook; in the end, forced to powercycle the host via mgmt console
11:18 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2001.codfw.wmnet
11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
11:12 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
10:21 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
10:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
10:17 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
10:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
10:12 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
10:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1002.eqiad.wmnet
10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
10:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
10:07 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
10:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
10:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
09:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc1002.eqiad.wmnet
09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2004-dev.wikimedia.org
09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:49 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48665 and previous config saved to /var/cache/conftool/dbconfig/20230531-093659-root.json
09:36 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
20:03 samtar@deploy1002: ksarabia and samtar: Backport for Turn on A/B Test Hebrew (T336969) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305. (duration: 00m 09s)
19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305.
13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P{R:Profile::Mariadb::Section = 's7'} and P{P:wmcs::db::wikireplicas::mariadb_multiinstance}" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
09:17 tgr@deploy1002: tgr: Backport for Improve handling of missing image recommendation synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
08:30 tgr@deploy1002: tgr: Backport for Improve logging of invalid image recommendation kinds synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
08:01 tgr@deploy1002: tgr: Backport for Section images: Accept more recommendation types synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf T322145
07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:16 moritzm: update bookworm installer to rc4 T330495
07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11 refs T337525
2023-05-29
15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474 (duration: 00m 08s)
12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474
11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474 (duration: 00m 08s)
11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474
10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:54 effie: pool parse1013-parse1016 to the jobrunner cluster - T329366
18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct (T328313)
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 T337446
14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000 to dumpsdata1006, copying all public xml dumps data
09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - T337248
07:37 mlitn@deploy1002: mlitn: Backport for Change maint script to do work via jobs (T322872) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:48 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:18 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation | T332474
14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for Enable Kartographer Nearby on remaining wikis (T336834) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:17 samtar@deploy1002: samtar and dcausse: Backport for [cirrus] Fix typo in config var synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:07 samtar@deploy1002: herron and samtar: Backport for arclamp: switch redis server to arclamp1001 (T327277) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
01:19 mutante: contint2001 - jenkins started again
01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
00:45 mutante: short maintenance on main contint server (jenkins)
00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
2023-05-23
23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - T324659
20:51 samtar@deploy1002: ksarabia and samtar: Backport for Turn on the A/B test for testwiki (T336969) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts T337327
18:26 ryankemper: [WDQS] T337327 Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
18:16 ryankemper: [WDQS] Rolled back requestctl rule
18:12 ryankemper: [WDQS] T337327 New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
16:42 sbassett: Deployed updated security mitigation for T336027
16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately | T214068
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T337206', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068 (duration: 00m 07s)
07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068
07:41 marostegui@deploy1002: marostegui: Backport for Revert "db-production.php: Disable writes in es5" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
07:19 marostegui@deploy1002: marostegui: Backport for db-production.php: Disable writes in es5 (T337285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337285
23:16 zabe@deploy1002: Finished scap: Backport for Enable VE on new wikis (duration: 06m 58s)
23:11 zabe@deploy1002: zabe: Backport for Enable VE on new wikis synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:38 sbassett: Deployed security mitigations for T333140 and T336027
20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:17 samtar@deploy1002: samtar and superpes: Backport for [kaawiki] Enable SandboxLink extension (T336648) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
10:02 moritzm: installing updated usb.ids packages for Bullseye
10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T337206', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 T337204', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - T337204
06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 T337204', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 T337203', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - T337203
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 T337203', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json
09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
15:21 taavi@deploy1002: legoktm and taavi: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
14:58 legoktm@deploy1002: legoktm: Backport for Disable GWToolset from Commons (T270911) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
14:35 sukhe: enable puppet on A:lvs, finished rolling out change
14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad (T322937)
13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia T330884
08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
07:11 moritzm: installing emacs security updates
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl T336725', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json
2023-05-18
23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs T330215
22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - T332355
22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
22:20 mutante: short down-time for zuul-merger on contint2001
21:47 mutante: maintenance for zuul (CI) on contint servers
21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Reverts hewiki A/B test (T335309) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
13:07 samtar@deploy1002: samtar and s-mukuti: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 06m 19s)
12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 07m 00s)
12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet
10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl T336833', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json
04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
2023-05-17
22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update T331297
19:01 Amir1: Removing db1112 from zarcillo T336332
18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet
18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for T244570
16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json
16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json
16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json
16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json
15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json
15:52 brett: Rolling out maglev LVS scheduler in esams - T263797
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json
15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json
15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json
15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json
15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json
15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json
15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json
15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json
15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json
14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs101[6-9]*} and A:aqs
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json
14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json
14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json
14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - T336817 (duration: 06m 20s)
14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs101[2-5]*} and A:aqs
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs102[0-1]*} and A:aqs
13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
13:40 taavi@deploy1002: taavi and maurelio: Backport for dblists: Close akwiki (T336675) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
13:32 taavi@deploy1002: stang and taavi: Backport for plwiki: Show language selector in main page header (T336707) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json
12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json
12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json
12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json
12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json
12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json
12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json
11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json
11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json
11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json
11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json
11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json
11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json
11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json
11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json
11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json
11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json
11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json
11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json
10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json
10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json
10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json
10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json
10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json
10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json
10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 (T335845)', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json
09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json
07:48 moritzm: upgrading krb1001 to Bullseye T331695
07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json
07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468
07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T336725', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json
07:09 kartik@deploy1002: Backport cancelled.
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json
06:01 volans: restarted ferm on ms-be1047
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json
05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json
05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet
05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet
05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl T336332', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json
05:16 marostegui: Optimize s7 on dbstore1003 T336733
19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
19:04 sukhe: dummry run of authdns-update to confirm new hosts
19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org
19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org
17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container
14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - T335042
14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - T335042
13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad
13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
13:25 taavi@deploy1002: mazevedo and taavi: Backport for Add stream config for mobile apps schema (T336508) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance T335042
10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - T317799
10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
07:28 Emperor: restart vopsbot.service on alert1001
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json
06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:18 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
17:15 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
15:00 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
14:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
14:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48228 and previous config saved to /var/cache/conftool/dbconfig/20230515-111624-ladsgroup.json
11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48227 and previous config saved to /var/cache/conftool/dbconfig/20230515-110118-ladsgroup.json
10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48226 and previous config saved to /var/cache/conftool/dbconfig/20230515-104611-ladsgroup.json
10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48225 and previous config saved to /var/cache/conftool/dbconfig/20230515-103105-ladsgroup.json
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48224 and previous config saved to /var/cache/conftool/dbconfig/20230515-102038-ladsgroup.json
10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
10:19 Amir1: Removing db1123 from zarcillo T334910
10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1123.eqiad.wmnet
10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48223 and previous config saved to /var/cache/conftool/dbconfig/20230515-101329-ladsgroup.json
10:13 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1123.eqiad.wmnet
09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48222 and previous config saved to /var/cache/conftool/dbconfig/20230515-095823-ladsgroup.json
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1123 from dbctl T334910', diff saved to https://phabricator.wikimedia.org/P48221 and previous config saved to /var/cache/conftool/dbconfig/20230515-095412-ladsgroup.json
09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1123 T334910', diff saved to https://phabricator.wikimedia.org/P48220 and previous config saved to /var/cache/conftool/dbconfig/20230515-094938-ladsgroup.json
09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48219 and previous config saved to /var/cache/conftool/dbconfig/20230515-094317-ladsgroup.json
09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15802
09:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15802
09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48218 and previous config saved to /var/cache/conftool/dbconfig/20230515-092810-ladsgroup.json
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48217 and previous config saved to /var/cache/conftool/dbconfig/20230515-091139-ladsgroup.json
09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
14:36 cdanis: silencing jobrunner/videoscaler probes for the weekend
14:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.wmnet
14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
01:08 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus6001.drmrs.wmnet
01:08 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:08 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
01:07 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
00:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus6001.drmrs.wmnet
00:51 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5001.eqsin.wmnet
00:51 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:51 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
00:50 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
00:44 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5001.eqsin.wmnet
00:32 denisse: manually removing prometheus4001.ulsfo.wmnet from the Ganeti master after a failed step in the decommission cookbook - T335585
00:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance
00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance
2023-05-11
23:39 mutante: LDAP - added uid lorenjohnson to groups wmde nda T335858
23:39 mutante: LDAP - added uid roti to groups wmde and nda T336435
21:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:10 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
21:07 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1225.eqiad.wmnet
21:07 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
20:32 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus4001.ulsfo.wment
20:31 urbanecm@deploy1002: urbanecm: Backport for [Growth] Remove config variables provided by extension synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:22 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment
20:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:21 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
20:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
20:17 denisse: manually remove prometheus3001.esams.wmnet from the ganeti master after a failed step in the decommission cookbook.
20:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
20:14 thcipriani@deploy1002: bd808 and thcipriani: Backport for Allow http://localhost callback URL (T299737) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
19:51 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
17:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
17:46 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48203 and previous config saved to /var/cache/conftool/dbconfig/20230511-164125-ladsgroup.json
16:37 brennen: train 1.41.0-wmf.8 (T330214): rolling back to group1 to test for T336504 presence/absence on enwiki
16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48201 and previous config saved to /var/cache/conftool/dbconfig/20230511-162619-ladsgroup.json
16:16 elukey: benthos webrequest live instances migrated to kafka-franz (new consumer client, data may have some holes) - T331801
16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48200 and previous config saved to /var/cache/conftool/dbconfig/20230511-161113-ladsgroup.json
16:08 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
16:01 Amir1: Removing db1110 from zarcillo T335011
16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1110.eqiad.wmnet
16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
15:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48199 and previous config saved to /var/cache/conftool/dbconfig/20230511-155607-ladsgroup.json
15:48 hashar: CI back up and fully operation (after the Gerrit upgrade)
15:48 mutante: gerrit maintenance period ended - gerrit switched to new hardware, IP and distro version
15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48198 and previous config saved to /var/cache/conftool/dbconfig/20230511-154533-ladsgroup.json
15:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1110.eqiad.wmnet
15:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
15:27 sukhe: [done] running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
15:27 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
15:21 sukhe: running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
15:18 urandom: altering image_suggestions schema (generated data platform) â T336424
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48197 and previous config saved to /var/cache/conftool/dbconfig/20230511-144959-ladsgroup.json
14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48195 and previous config saved to /var/cache/conftool/dbconfig/20230511-143453-ladsgroup.json
14:27 moritzm: installing avahi security updates
14:26 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2012
14:26 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2012
14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48194 and previous config saved to /var/cache/conftool/dbconfig/20230511-141947-ladsgroup.json
14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48192 and previous config saved to /var/cache/conftool/dbconfig/20230511-140440-ladsgroup.json
12:48 ladsgroup@deploy1002: ladsgroup: Backport for Add outreachwiki to wikidataclient dblist (T171140) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
12:41 Amir1: creating wikidata client tables for outreachwiki (T171140)
12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
12:01 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
11:57 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
11:54 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48190 and previous config saved to /var/cache/conftool/dbconfig/20230511-115201-root.json
11:39 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48189 and previous config saved to /var/cache/conftool/dbconfig/20230511-113657-root.json
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48187 and previous config saved to /var/cache/conftool/dbconfig/20230511-112152-root.json
11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48186 and previous config saved to /var/cache/conftool/dbconfig/20230511-110647-root.json
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48185 and previous config saved to /var/cache/conftool/dbconfig/20230511-105142-root.json
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48184 and previous config saved to /var/cache/conftool/dbconfig/20230511-103638-root.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48182 and previous config saved to /var/cache/conftool/dbconfig/20230511-100628-root.json
09:35 moritzm: installing distro-info-data updates on buster
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137', diff saved to https://phabricator.wikimedia.org/P48181 and previous config saved to /var/cache/conftool/dbconfig/20230511-092848-root.json
08:59 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
08:56 jelto@cumin1001: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
08:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
08:40 elukey: `apt-get clean` on orespoolcounter nodes to free space in the root partition
08:33 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
08:13 moritzm: installing Linux 4.19.282 updates on Buster systems
08:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330214
08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
08:05 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:cassandra-dev
07:43 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
07:41 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
07:39 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
07:14 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 20940
20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48177 and previous config saved to /var/cache/conftool/dbconfig/20230510-202014-ladsgroup.json
20:14 cjming@deploy1002: cjming and jdlrobson: Backport for Remove unnecessary jQuery closure (T324913) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48176 and previous config saved to /var/cache/conftool/dbconfig/20230510-200508-ladsgroup.json
20:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48175 and previous config saved to /var/cache/conftool/dbconfig/20230510-195001-ladsgroup.json
19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48174 and previous config saved to /var/cache/conftool/dbconfig/20230510-193455-ladsgroup.json
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48173 and previous config saved to /var/cache/conftool/dbconfig/20230510-192746-ladsgroup.json
19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48172 and previous config saved to /var/cache/conftool/dbconfig/20230510-192722-ladsgroup.json
19:25 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48171 and previous config saved to /var/cache/conftool/dbconfig/20230510-191216-ladsgroup.json
19:08 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
19:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48170 and previous config saved to /var/cache/conftool/dbconfig/20230510-185710-ladsgroup.json
18:54 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172]
18:45 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 191m 53s)
18:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48169 and previous config saved to /var/cache/conftool/dbconfig/20230510-184202-ladsgroup.json
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48168 and previous config saved to /var/cache/conftool/dbconfig/20230510-183441-ladsgroup.json
18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48167 and previous config saved to /var/cache/conftool/dbconfig/20230510-183418-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48166 and previous config saved to /var/cache/conftool/dbconfig/20230510-181912-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48165 and previous config saved to /var/cache/conftool/dbconfig/20230510-180406-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48164 and previous config saved to /var/cache/conftool/dbconfig/20230510-174859-ladsgroup.json
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48163 and previous config saved to /var/cache/conftool/dbconfig/20230510-174143-ladsgroup.json
17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48162 and previous config saved to /var/cache/conftool/dbconfig/20230510-174119-ladsgroup.json
17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48161 and previous config saved to /var/cache/conftool/dbconfig/20230510-172613-ladsgroup.json
17:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48160 and previous config saved to /var/cache/conftool/dbconfig/20230510-171107-ladsgroup.json
16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48159 and previous config saved to /var/cache/conftool/dbconfig/20230510-165601-ladsgroup.json
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
16:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48158 and previous config saved to /var/cache/conftool/dbconfig/20230510-164842-ladsgroup.json
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48157 and previous config saved to /var/cache/conftool/dbconfig/20230510-164818-ladsgroup.json
16:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48156 and previous config saved to /var/cache/conftool/dbconfig/20230510-163312-ladsgroup.json
16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
16:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48155 and previous config saved to /var/cache/conftool/dbconfig/20230510-161806-ladsgroup.json
16:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
16:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48154 and previous config saved to /var/cache/conftool/dbconfig/20230510-160258-ladsgroup.json
15:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48153 and previous config saved to /var/cache/conftool/dbconfig/20230510-155429-ladsgroup.json
15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48152 and previous config saved to /var/cache/conftool/dbconfig/20230510-155357-ladsgroup.json
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
15:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48151 and previous config saved to /var/cache/conftool/dbconfig/20230510-153851-ladsgroup.json
15:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
15:33 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48150 and previous config saved to /var/cache/conftool/dbconfig/20230510-152345-ladsgroup.json
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48149 and previous config saved to /var/cache/conftool/dbconfig/20230510-150838-ladsgroup.json
15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48148 and previous config saved to /var/cache/conftool/dbconfig/20230510-150009-ladsgroup.json
15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48147 and previous config saved to /var/cache/conftool/dbconfig/20230510-145946-ladsgroup.json
14:58 cwhite: install vopsbot 0.3.4 on alert2001 T329791
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48146 and previous config saved to /var/cache/conftool/dbconfig/20230510-144440-ladsgroup.json
14:44 moritzm: restarting FPM/Apache on mw canaries to pick up libxml2 updates
14:41 moritzm: installing libxml2 security updates on buster
14:40 thcipriani: stopping gerrit on gerrit1001
14:40 thcipriani: stopping gerrit on gerrit1003
14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48145 and previous config saved to /var/cache/conftool/dbconfig/20230510-142934-ladsgroup.json
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48144 and previous config saved to /var/cache/conftool/dbconfig/20230510-141427-ladsgroup.json
14:08 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
14:08 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48143 and previous config saved to /var/cache/conftool/dbconfig/20230510-140708-ladsgroup.json
14:07 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
14:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48142 and previous config saved to /var/cache/conftool/dbconfig/20230510-140644-ladsgroup.json
14:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48140 and previous config saved to /var/cache/conftool/dbconfig/20230510-135138-ladsgroup.json
13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48139 and previous config saved to /var/cache/conftool/dbconfig/20230510-133632-ladsgroup.json
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48138 and previous config saved to /var/cache/conftool/dbconfig/20230510-132126-ladsgroup.json
13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48137 and previous config saved to /var/cache/conftool/dbconfig/20230510-131412-ladsgroup.json
13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48136 and previous config saved to /var/cache/conftool/dbconfig/20230510-131347-ladsgroup.json
13:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
13:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48135 and previous config saved to /var/cache/conftool/dbconfig/20230510-125840-ladsgroup.json
12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
12:52 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48134 and previous config saved to /var/cache/conftool/dbconfig/20230510-124334-ladsgroup.json
12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48133 and previous config saved to /var/cache/conftool/dbconfig/20230510-122828-ladsgroup.json
12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48132 and previous config saved to /var/cache/conftool/dbconfig/20230510-122316-ladsgroup.json
12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48131 and previous config saved to /var/cache/conftool/dbconfig/20230510-122253-ladsgroup.json
12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48129 and previous config saved to /var/cache/conftool/dbconfig/20230510-120747-ladsgroup.json
11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48128 and previous config saved to /var/cache/conftool/dbconfig/20230510-115241-ladsgroup.json
11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48127 and previous config saved to /var/cache/conftool/dbconfig/20230510-113734-ladsgroup.json
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48126 and previous config saved to /var/cache/conftool/dbconfig/20230510-113215-ladsgroup.json
11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48125 and previous config saved to /var/cache/conftool/dbconfig/20230510-112855-ladsgroup.json
11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
11:18 _joe_: installing vopsbot 0.3.4 on alert1001 T329791
11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48124 and previous config saved to /var/cache/conftool/dbconfig/20230510-111349-ladsgroup.json
11:11 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48123 and previous config saved to /var/cache/conftool/dbconfig/20230510-105843-ladsgroup.json
10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48122 and previous config saved to /var/cache/conftool/dbconfig/20230510-104337-ladsgroup.json
10:38 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48121 and previous config saved to /var/cache/conftool/dbconfig/20230510-103712-ladsgroup.json
10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48120 and previous config saved to /var/cache/conftool/dbconfig/20230510-103649-ladsgroup.json
10:26 Amir1: Removing db1113 from zarcillo T336029
10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48119 and previous config saved to /var/cache/conftool/dbconfig/20230510-102302-ladsgroup.json
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48118 and previous config saved to /var/cache/conftool/dbconfig/20230510-102142-ladsgroup.json
10:21 Amir1: start of clean up of echo notification in wikidatawiki (T318523)
10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1113.eqiad.wmnet
10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:16 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1113.eqiad.wmnet
10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48117 and previous config saved to /var/cache/conftool/dbconfig/20230510-100756-ladsgroup.json
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48116 and previous config saved to /var/cache/conftool/dbconfig/20230510-100636-ladsgroup.json
10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2004.codfw.wmnet
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48115 and previous config saved to /var/cache/conftool/dbconfig/20230510-095309-root.json
09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48114 and previous config saved to /var/cache/conftool/dbconfig/20230510-095250-ladsgroup.json
09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48113 and previous config saved to /var/cache/conftool/dbconfig/20230510-095130-ladsgroup.json
09:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
09:50 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1004.eqiad.wmnet
09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48112 and previous config saved to /var/cache/conftool/dbconfig/20230510-094452-ladsgroup.json
09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48111 and previous config saved to /var/cache/conftool/dbconfig/20230510-094429-ladsgroup.json
09:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48110 and previous config saved to /var/cache/conftool/dbconfig/20230510-093804-root.json
09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48109 and previous config saved to /var/cache/conftool/dbconfig/20230510-093743-ladsgroup.json
09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48107 and previous config saved to /var/cache/conftool/dbconfig/20230510-092923-ladsgroup.json
09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48106 and previous config saved to /var/cache/conftool/dbconfig/20230510-092531-ladsgroup.json
09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48105 and previous config saved to /var/cache/conftool/dbconfig/20230510-092507-ladsgroup.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48104 and previous config saved to /var/cache/conftool/dbconfig/20230510-092259-root.json
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48103 and previous config saved to /var/cache/conftool/dbconfig/20230510-091624-root.json
09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48102 and previous config saved to /var/cache/conftool/dbconfig/20230510-091417-ladsgroup.json
09:12 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48101 and previous config saved to /var/cache/conftool/dbconfig/20230510-091001-ladsgroup.json
09:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48100 and previous config saved to /var/cache/conftool/dbconfig/20230510-090755-root.json
09:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
09:01 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
09:01 hashar: Gerrit restarted at version 3.5.6 | T336339
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48099 and previous config saved to /var/cache/conftool/dbconfig/20230510-090119-root.json
08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48098 and previous config saved to /var/cache/conftool/dbconfig/20230510-085910-ladsgroup.json
08:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 05s)
08:57 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
08:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 09s)
08:56 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
08:55 hashar: Stopping Gerrit for 3.5.5 > 3.5.6 upgrade T336339
08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48097 and previous config saved to /var/cache/conftool/dbconfig/20230510-085455-ladsgroup.json
08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48096 and previous config saved to /var/cache/conftool/dbconfig/20230510-085330-ladsgroup.json
08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48095 and previous config saved to /var/cache/conftool/dbconfig/20230510-085250-root.json
08:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
08:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339 (duration: 00m 07s)
08:49 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339
08:48 hashar: deploy1002: git reset `/srv/deployment/gerrit/gerrit` which had bunch of locally modified files for some reason # T336339
08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48094 and previous config saved to /var/cache/conftool/dbconfig/20230510-084614-root.json
08:40 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
08:40 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
08:39 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48093 and previous config saved to /var/cache/conftool/dbconfig/20230510-083948-ladsgroup.json
08:39 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48092 and previous config saved to /var/cache/conftool/dbconfig/20230510-083745-root.json
08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48091 and previous config saved to /var/cache/conftool/dbconfig/20230510-083253-ladsgroup.json
08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48090 and previous config saved to /var/cache/conftool/dbconfig/20230510-083109-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48089 and previous config saved to /var/cache/conftool/dbconfig/20230510-082240-root.json
22:42 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
22:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48058 and previous config saved to /var/cache/conftool/dbconfig/20230509-223346-ladsgroup.json
22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
22:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
22:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48057 and previous config saved to /var/cache/conftool/dbconfig/20230509-221840-ladsgroup.json
22:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
22:06 inflatador: bking@wcqs1002 depool wcqs1002 while it catches up on lag
22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48056 and previous config saved to /var/cache/conftool/dbconfig/20230509-220333-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48055 and previous config saved to /var/cache/conftool/dbconfig/20230509-214827-ladsgroup.json
21:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
21:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48054 and previous config saved to /var/cache/conftool/dbconfig/20230509-213834-ladsgroup.json
21:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48053 and previous config saved to /var/cache/conftool/dbconfig/20230509-213808-ladsgroup.json
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48052 and previous config saved to /var/cache/conftool/dbconfig/20230509-212302-ladsgroup.json
21:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
21:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48051 and previous config saved to /var/cache/conftool/dbconfig/20230509-210755-ladsgroup.json
20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48050 and previous config saved to /var/cache/conftool/dbconfig/20230509-205249-ladsgroup.json
20:52 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
17:31 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
17:31 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
17:31 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
17:28 aokoth@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host vrts2001.codfw.wmnet with OS bullseye
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48048 and previous config saved to /var/cache/conftool/dbconfig/20230509-172826-ladsgroup.json
17:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
17:17 rzl: rolling restart apache on eqiad appservers T225778
17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48047 and previous config saved to /var/cache/conftool/dbconfig/20230509-171320-ladsgroup.json
17:11 rzl: rolling restart apache on codfw appservers T225778
17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
17:00 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_ulsfo
17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48046 and previous config saved to /var/cache/conftool/dbconfig/20230509-165813-ladsgroup.json
16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
16:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
16:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
16:46 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48045 and previous config saved to /var/cache/conftool/dbconfig/20230509-164307-ladsgroup.json
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48044 and previous config saved to /var/cache/conftool/dbconfig/20230509-163646-ladsgroup.json
16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48043 and previous config saved to /var/cache/conftool/dbconfig/20230509-163621-ladsgroup.json
16:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
16:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
16:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2012']
16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48042 and previous config saved to /var/cache/conftool/dbconfig/20230509-162904-ladsgroup.json
16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48041 and previous config saved to /var/cache/conftool/dbconfig/20230509-162115-ladsgroup.json
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48039 and previous config saved to /var/cache/conftool/dbconfig/20230509-161358-ladsgroup.json
16:08 jnuche@deploy1002: Installing scap version "4.52.1" for 593 hosts
16:07 rzl: stopping puppet on appservers - T225778
16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48038 and previous config saved to /var/cache/conftool/dbconfig/20230509-160608-ladsgroup.json
16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48037 and previous config saved to /var/cache/conftool/dbconfig/20230509-155852-ladsgroup.json
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48036 and previous config saved to /var/cache/conftool/dbconfig/20230509-155102-ladsgroup.json
15:50 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
15:48 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
15:48 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48035 and previous config saved to /var/cache/conftool/dbconfig/20230509-154346-ladsgroup.json
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48034 and previous config saved to /var/cache/conftool/dbconfig/20230509-154338-ladsgroup.json
15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48033 and previous config saved to /var/cache/conftool/dbconfig/20230509-154313-ladsgroup.json
15:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48032 and previous config saved to /var/cache/conftool/dbconfig/20230509-153715-ladsgroup.json
15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48031 and previous config saved to /var/cache/conftool/dbconfig/20230509-153651-ladsgroup.json
15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48030 and previous config saved to /var/cache/conftool/dbconfig/20230509-152804-ladsgroup.json
15:23 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48029 and previous config saved to /var/cache/conftool/dbconfig/20230509-152145-ladsgroup.json
15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
15:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48028 and previous config saved to /var/cache/conftool/dbconfig/20230509-151258-ladsgroup.json
15:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm
15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48027 and previous config saved to /var/cache/conftool/dbconfig/20230509-150639-ladsgroup.json
15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2180']
14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48026 and previous config saved to /var/cache/conftool/dbconfig/20230509-145752-ladsgroup.json
14:54 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 45m 45s)
14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48025 and previous config saved to /var/cache/conftool/dbconfig/20230509-145133-ladsgroup.json
14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48024 and previous config saved to /var/cache/conftool/dbconfig/20230509-145128-ladsgroup.json
14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48023 and previous config saved to /var/cache/conftool/dbconfig/20230509-145057-ladsgroup.json
14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2008.codfw.wmnet
14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48022 and previous config saved to /var/cache/conftool/dbconfig/20230509-144457-ladsgroup.json
14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48021 and previous config saved to /var/cache/conftool/dbconfig/20230509-144433-ladsgroup.json
14:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48020 and previous config saved to /var/cache/conftool/dbconfig/20230509-143550-ladsgroup.json
14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2008.codfw.wmnet
14:32 sukhe: decommission lvs2008
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48019 and previous config saved to /var/cache/conftool/dbconfig/20230509-142927-ladsgroup.json
14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1010
14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1010
14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48018 and previous config saved to /var/cache/conftool/dbconfig/20230509-142044-ladsgroup.json
14:15 sukhe: set routing-options static route 208.80.153.240/28 next-hop 10.192.49.7 [move static route for high-traffic2 to lvs2010]: T335777
14:15 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48017 and previous config saved to /var/cache/conftool/dbconfig/20230509-141421-ladsgroup.json
14:08 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48016 and previous config saved to /var/cache/conftool/dbconfig/20230509-140535-ladsgroup.json
13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48015 and previous config saved to /var/cache/conftool/dbconfig/20230509-135915-ladsgroup.json
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48014 and previous config saved to /var/cache/conftool/dbconfig/20230509-135815-ladsgroup.json
13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48013 and previous config saved to /var/cache/conftool/dbconfig/20230509-135750-ladsgroup.json
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48012 and previous config saved to /var/cache/conftool/dbconfig/20230509-134952-ladsgroup.json
13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48011 and previous config saved to /var/cache/conftool/dbconfig/20230509-134929-ladsgroup.json
13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 T336031', diff saved to https://phabricator.wikimedia.org/P48010 and previous config saved to /var/cache/conftool/dbconfig/20230509-134921-root.json
13:44 moritzm: rearmed keyholder on netmon* post reboot
13:43 taavi@deploy1002: taavi: Backport for Add $wmgUseRealMe (T324535) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48009 and previous config saved to /var/cache/conftool/dbconfig/20230509-134244-ladsgroup.json
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48008 and previous config saved to /var/cache/conftool/dbconfig/20230509-133416-ladsgroup.json
13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
13:28 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48007 and previous config saved to /var/cache/conftool/dbconfig/20230509-132737-ladsgroup.json
13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
13:23 taavi@deploy1002: taavi: Backport for Add RealMe to extension-list (T324535) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-worker1088.eqiad.wmnet
13:23 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-worker1088.eqiad.wmnet
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48006 and previous config saved to /var/cache/conftool/dbconfig/20230509-131910-ladsgroup.json
13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48005 and previous config saved to /var/cache/conftool/dbconfig/20230509-131231-ladsgroup.json
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48004 and previous config saved to /var/cache/conftool/dbconfig/20230509-130524-ladsgroup.json
13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P48003 and previous config saved to /var/cache/conftool/dbconfig/20230509-130459-ladsgroup.json
13:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync after adding ldap-rw servers - jmm@cumin2002"
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48002 and previous config saved to /var/cache/conftool/dbconfig/20230509-130404-ladsgroup.json
12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
12:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bullseye
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48001 and previous config saved to /var/cache/conftool/dbconfig/20230509-125644-ladsgroup.json
12:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
12:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P48000 and previous config saved to /var/cache/conftool/dbconfig/20230509-125620-ladsgroup.json
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47999 and previous config saved to /var/cache/conftool/dbconfig/20230509-124953-ladsgroup.json
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
12:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47997 and previous config saved to /var/cache/conftool/dbconfig/20230509-124114-ladsgroup.json
12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47996 and previous config saved to /var/cache/conftool/dbconfig/20230509-123447-ladsgroup.json
12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw2001.wikimedia.org with OS bullseye
12:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47995 and previous config saved to /var/cache/conftool/dbconfig/20230509-122608-ladsgroup.json
12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47994 and previous config saved to /var/cache/conftool/dbconfig/20230509-121941-ladsgroup.json
12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aphlict1001.eqiad.wmnet
12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47992 and previous config saved to /var/cache/conftool/dbconfig/20230509-121119-ladsgroup.json
12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47991 and previous config saved to /var/cache/conftool/dbconfig/20230509-121102-ladsgroup.json
12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47990 and previous config saved to /var/cache/conftool/dbconfig/20230509-121053-ladsgroup.json
12:06 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47989 and previous config saved to /var/cache/conftool/dbconfig/20230509-120433-ladsgroup.json
12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47988 and previous config saved to /var/cache/conftool/dbconfig/20230509-120410-ladsgroup.json
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47987 and previous config saved to /var/cache/conftool/dbconfig/20230509-115547-ladsgroup.json
11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47986 and previous config saved to /var/cache/conftool/dbconfig/20230509-114903-ladsgroup.json
11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47985 and previous config saved to /var/cache/conftool/dbconfig/20230509-114041-ladsgroup.json
11:36 kart_: Updated MinT to 2023-05-09-110213-production (T331505, T335725, T331505)
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47984 and previous config saved to /var/cache/conftool/dbconfig/20230509-113357-ladsgroup.json
11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47983 and previous config saved to /var/cache/conftool/dbconfig/20230509-112535-ladsgroup.json
11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47982 and previous config saved to /var/cache/conftool/dbconfig/20230509-111851-ladsgroup.json
11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47981 and previous config saved to /var/cache/conftool/dbconfig/20230509-111755-ladsgroup.json
11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47980 and previous config saved to /var/cache/conftool/dbconfig/20230509-111730-ladsgroup.json
11:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47979 and previous config saved to /var/cache/conftool/dbconfig/20230509-111235-ladsgroup.json
11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47978 and previous config saved to /var/cache/conftool/dbconfig/20230509-111211-ladsgroup.json
11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw1001.wikimedia.org with OS bullseye
11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47977 and previous config saved to /var/cache/conftool/dbconfig/20230509-110222-ladsgroup.json
10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47976 and previous config saved to /var/cache/conftool/dbconfig/20230509-105704-ladsgroup.json
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47975 and previous config saved to /var/cache/conftool/dbconfig/20230509-104715-ladsgroup.json
10:45 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:42 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47974 and previous config saved to /var/cache/conftool/dbconfig/20230509-104158-ladsgroup.json
10:39 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
10:36 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
10:36 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
10:32 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cloudcontrol2001-dev.wikimedia.org
10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47973 and previous config saved to /var/cache/conftool/dbconfig/20230509-103209-ladsgroup.json
10:29 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47972 and previous config saved to /var/cache/conftool/dbconfig/20230509-102652-ladsgroup.json
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47971 and previous config saved to /var/cache/conftool/dbconfig/20230509-102644-ladsgroup.json
10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47970 and previous config saved to /var/cache/conftool/dbconfig/20230509-102619-ladsgroup.json
10:26 volans@cumin1001: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary
10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
10:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47969 and previous config saved to /var/cache/conftool/dbconfig/20230509-102001-ladsgroup.json
10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47968 and previous config saved to /var/cache/conftool/dbconfig/20230509-101938-ladsgroup.json
10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47967 and previous config saved to /var/cache/conftool/dbconfig/20230509-101113-ladsgroup.json
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47966 and previous config saved to /var/cache/conftool/dbconfig/20230509-100431-ladsgroup.json
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47965 and previous config saved to /var/cache/conftool/dbconfig/20230509-095607-ladsgroup.json
09:55 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:53 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47964 and previous config saved to /var/cache/conftool/dbconfig/20230509-094925-ladsgroup.json
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47962 and previous config saved to /var/cache/conftool/dbconfig/20230509-094100-ladsgroup.json
09:37 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3003.esams.wmnet
09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47961 and previous config saved to /var/cache/conftool/dbconfig/20230509-093419-ladsgroup.json
09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47960 and previous config saved to /var/cache/conftool/dbconfig/20230509-093320-ladsgroup.json
09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping3003.esams.wmnet
09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47959 and previous config saved to /var/cache/conftool/dbconfig/20230509-092843-ladsgroup.json
09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
09:23 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
09:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.8 refs T330214
09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2003.codfw.wmnet
09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
08:40 marostegui: Stop mariadb on db1115 (old zarcillo master) T334455
08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
08:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw1001.wikimedia.org
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw1001.wikimedia.org on all recursors
08:36 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw1001.wikimedia.org on all recursors
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
08:30 marostegui: Failover m5-master from dbproxy1021 to dbproxy1017
08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw1001.wikimedia.org
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw2001.wikimedia.org
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
08:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw2001.wikimedia.org on all recursors
08:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw2001.wikimedia.org on all recursors
08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:16 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
08:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
08:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
08:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
08:08 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
08:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw2001.wikimedia.org
08:04 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
06:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
06:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
05:28 marostegui: Starting db-inventory eqiad failover from db1115 to db1215 - T335014
05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
00:00 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor everywhere (T334295) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47958 and previous config saved to /var/cache/conftool/dbconfig/20230508-233832-ladsgroup.json
23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47957 and previous config saved to /var/cache/conftool/dbconfig/20230508-232325-ladsgroup.json
23:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47956 and previous config saved to /var/cache/conftool/dbconfig/20230508-230819-ladsgroup.json
22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47955 and previous config saved to /var/cache/conftool/dbconfig/20230508-225313-ladsgroup.json
22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47954 and previous config saved to /var/cache/conftool/dbconfig/20230508-224657-ladsgroup.json
22:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47953 and previous config saved to /var/cache/conftool/dbconfig/20230508-224622-ladsgroup.json
22:34 eileen: config revision changed from 7ac11236 to 48f7485f - disabled populate contribution tracking
22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47952 and previous config saved to /var/cache/conftool/dbconfig/20230508-223115-ladsgroup.json
22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47951 and previous config saved to /var/cache/conftool/dbconfig/20230508-221609-ladsgroup.json
22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47950 and previous config saved to /var/cache/conftool/dbconfig/20230508-220103-ladsgroup.json
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47949 and previous config saved to /var/cache/conftool/dbconfig/20230508-215323-ladsgroup.json
21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47948 and previous config saved to /var/cache/conftool/dbconfig/20230508-215300-ladsgroup.json
21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47947 and previous config saved to /var/cache/conftool/dbconfig/20230508-213754-ladsgroup.json
21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47946 and previous config saved to /var/cache/conftool/dbconfig/20230508-212248-ladsgroup.json
21:21 mforns@deploy1002: Started deploy [airflow-dags/analytics@a6a3ceb]: (no justification provided)
21:18 mstyles@deploy1002: mstyles and sbassett: Backport for Disable translation memory on collabwiki (T313241) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1001.eqiad.wmnet
21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47945 and previous config saved to /var/cache/conftool/dbconfig/20230508-210742-ladsgroup.json
21:02 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1001.eqiad.wmnet
21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47944 and previous config saved to /var/cache/conftool/dbconfig/20230508-210119-ladsgroup.json
21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47943 and previous config saved to /var/cache/conftool/dbconfig/20230508-210056-ladsgroup.json
20:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1002.eqiad.wmnet
20:53 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1002.eqiad.wmnet
20:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1004.eqiad.wmnet
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47942 and previous config saved to /var/cache/conftool/dbconfig/20230508-204549-ladsgroup.json
20:43 mutante: miscweb2003 - rebooting
20:41 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1004.eqiad.wmnet
20:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
20:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
20:39 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1003.eqiad.wmnet
20:36 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts1001.eqiad.wmnet with OS bullseye
20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
20:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
20:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1003.eqiad.wmnet
20:31 taavi@deploy1002: jdlrobson and taavi: Backport for Deploy fixed width indicator to wikis (T335307) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47941 and previous config saved to /var/cache/conftool/dbconfig/20230508-203043-ladsgroup.json
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47939 and previous config saved to /var/cache/conftool/dbconfig/20230508-201537-ladsgroup.json
20:11 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts1001.eqiad.wmnet with OS bullseye
20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47937 and previous config saved to /var/cache/conftool/dbconfig/20230508-200802-ladsgroup.json
20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1001.eqiad.wmnet
20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
20:04 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1001.eqiad.wmnet on all recursors
20:04 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1001.eqiad.wmnet on all recursors
20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
20:03 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
19:59 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1001.eqiad.wmnet
19:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1005.eqiad.wmnet
19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47936 and previous config saved to /var/cache/conftool/dbconfig/20230508-195256-ladsgroup.json
19:52 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
19:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2004.codfw.wmnet
19:46 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1005.eqiad.wmnet
19:45 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2004.codfw.wmnet
19:45 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2004.codfw.wmnet
19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2006.codfw.wmnet
19:39 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2004.codfw.wmnet
19:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2005.codfw.wmnet
19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47935 and previous config saved to /var/cache/conftool/dbconfig/20230508-193750-ladsgroup.json
19:34 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2006.codfw.wmnet
19:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2005.codfw.wmnet
19:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2003.codfw.wmnet
19:24 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2003.codfw.wmnet
19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47934 and previous config saved to /var/cache/conftool/dbconfig/20230508-192243-ladsgroup.json
19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
19:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2006.codfw.wmnet
19:20 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2006.codfw.wmnet
19:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
19:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47933 and previous config saved to /var/cache/conftool/dbconfig/20230508-191630-ladsgroup.json
19:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
19:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47932 and previous config saved to /var/cache/conftool/dbconfig/20230508-191607-ladsgroup.json
19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 19 hosts with reason: rebooting to help with lag
19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 19 hosts with reason: rebooting to help with lag
19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47931 and previous config saved to /var/cache/conftool/dbconfig/20230508-190100-ladsgroup.json
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47930 and previous config saved to /var/cache/conftool/dbconfig/20230508-184554-ladsgroup.json
18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47929 and previous config saved to /var/cache/conftool/dbconfig/20230508-183048-ladsgroup.json
18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47928 and previous config saved to /var/cache/conftool/dbconfig/20230508-182350-ladsgroup.json
18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47927 and previous config saved to /var/cache/conftool/dbconfig/20230508-182327-ladsgroup.json
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47926 and previous config saved to /var/cache/conftool/dbconfig/20230508-180820-ladsgroup.json
18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
18:04 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 113m 03s)
18:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2002.codfw.wmnet
18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47925 and previous config saved to /var/cache/conftool/dbconfig/20230508-180239-ladsgroup.json
17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2002.codfw.wmnet
17:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2001.codfw.wmnet
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47923 and previous config saved to /var/cache/conftool/dbconfig/20230508-175314-ladsgroup.json
17:48 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2001.codfw.wmnet
17:48 sukhe: restart pybal on lvs2011 to pick up bgp med change: T326767
17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47922 and previous config saved to /var/cache/conftool/dbconfig/20230508-174732-ladsgroup.json
17:39 sukhe: homer "cr*-codfw*" commit "Gerrit: 914871 add new LVS host lvs2011": T326767
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47920 and previous config saved to /var/cache/conftool/dbconfig/20230508-173808-ladsgroup.json
17:38 volans: installed spicerack 7.0.0 on cumin1001
17:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1001.eqiad.wmnet
17:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
17:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2011
17:35 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2011
17:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2011.codfw.wmnet
17:33 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2011.codfw.wmnet
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47919 and previous config saved to /var/cache/conftool/dbconfig/20230508-173226-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47918 and previous config saved to /var/cache/conftool/dbconfig/20230508-173152-ladsgroup.json
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
17:31 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
17:31 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
17:29 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
17:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
17:28 volans: installed spicerack 7.0.0 on cumin2002
17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
17:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47917 and previous config saved to /var/cache/conftool/dbconfig/20230508-171720-ladsgroup.json
17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47916 and previous config saved to /var/cache/conftool/dbconfig/20230508-170902-ladsgroup.json
17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47915 and previous config saved to /var/cache/conftool/dbconfig/20230508-170828-ladsgroup.json
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47914 and previous config saved to /var/cache/conftool/dbconfig/20230508-170542-ladsgroup.json
16:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
16:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47913 and previous config saved to /var/cache/conftool/dbconfig/20230508-165322-ladsgroup.json
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47912 and previous config saved to /var/cache/conftool/dbconfig/20230508-165036-ladsgroup.json
16:46 volans: uploaded spicerack_7.0.0 to apt.wikimedia.org bullseye-wikimedia
16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47910 and previous config saved to /var/cache/conftool/dbconfig/20230508-163816-ladsgroup.json
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47909 and previous config saved to /var/cache/conftool/dbconfig/20230508-163530-ladsgroup.json
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47908 and previous config saved to /var/cache/conftool/dbconfig/20230508-162309-ladsgroup.json
16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47907 and previous config saved to /var/cache/conftool/dbconfig/20230508-162024-ladsgroup.json
16:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47906 and previous config saved to /var/cache/conftool/dbconfig/20230508-161313-ladsgroup.json
16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47905 and previous config saved to /var/cache/conftool/dbconfig/20230508-161258-ladsgroup.json
16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47904 and previous config saved to /var/cache/conftool/dbconfig/20230508-161235-ladsgroup.json
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47903 and previous config saved to /var/cache/conftool/dbconfig/20230508-161234-ladsgroup.json
16:11 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47902 and previous config saved to /var/cache/conftool/dbconfig/20230508-155729-ladsgroup.json
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47901 and previous config saved to /var/cache/conftool/dbconfig/20230508-155728-ladsgroup.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47900 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47899 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47898 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47897 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
15:25 moritzm: installing grep updates from Bullseye 11.7 point release
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47896 and previous config saved to /var/cache/conftool/dbconfig/20230508-151952-ladsgroup.json
15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47895 and previous config saved to /var/cache/conftool/dbconfig/20230508-151929-ladsgroup.json
15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47894 and previous config saved to /var/cache/conftool/dbconfig/20230508-151556-ladsgroup.json
15:12 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004": T326688
15:09 sukhe: homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004"
15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47893 and previous config saved to /var/cache/conftool/dbconfig/20230508-150423-ladsgroup.json
15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47892 and previous config saved to /var/cache/conftool/dbconfig/20230508-150050-ladsgroup.json
14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2004.wikimedia.org
14:57 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for dns2004.wikimedia.org
14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bullseye
14:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330214
14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47891 and previous config saved to /var/cache/conftool/dbconfig/20230508-144916-ladsgroup.json
14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47890 and previous config saved to /var/cache/conftool/dbconfig/20230508-144544-ladsgroup.json
14:40 brennen: train 1.41.0-wmf.7 (T330213): proceeding to all wikis
14:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47889 and previous config saved to /var/cache/conftool/dbconfig/20230508-143410-ladsgroup.json
14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47888 and previous config saved to /var/cache/conftool/dbconfig/20230508-143038-ladsgroup.json
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47887 and previous config saved to /var/cache/conftool/dbconfig/20230508-142543-ladsgroup.json
14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47886 and previous config saved to /var/cache/conftool/dbconfig/20230508-142520-ladsgroup.json
14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47885 and previous config saved to /var/cache/conftool/dbconfig/20230508-142427-ladsgroup.json
14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47884 and previous config saved to /var/cache/conftool/dbconfig/20230508-142302-ladsgroup.json
14:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47883 and previous config saved to /var/cache/conftool/dbconfig/20230508-142237-ladsgroup.json
14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bullseye
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47882 and previous config saved to /var/cache/conftool/dbconfig/20230508-141014-ladsgroup.json
14:09 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1001.eqiad.wmnet
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47881 and previous config saved to /var/cache/conftool/dbconfig/20230508-140731-ladsgroup.json
13:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47880 and previous config saved to /var/cache/conftool/dbconfig/20230508-135508-ladsgroup.json
13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47879 and previous config saved to /var/cache/conftool/dbconfig/20230508-135224-ladsgroup.json
13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47878 and previous config saved to /var/cache/conftool/dbconfig/20230508-134002-ladsgroup.json
13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47877 and previous config saved to /var/cache/conftool/dbconfig/20230508-133718-ladsgroup.json
13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47876 and previous config saved to /var/cache/conftool/dbconfig/20230508-133034-ladsgroup.json
13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47875 and previous config saved to /var/cache/conftool/dbconfig/20230508-133011-ladsgroup.json
13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47874 and previous config saved to /var/cache/conftool/dbconfig/20230508-132957-ladsgroup.json
13:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
13:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47873 and previous config saved to /var/cache/conftool/dbconfig/20230508-132932-ladsgroup.json
13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47872 and previous config saved to /var/cache/conftool/dbconfig/20230508-131504-ladsgroup.json
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47871 and previous config saved to /var/cache/conftool/dbconfig/20230508-131426-ladsgroup.json
13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47870 and previous config saved to /var/cache/conftool/dbconfig/20230508-125958-ladsgroup.json
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47869 and previous config saved to /var/cache/conftool/dbconfig/20230508-125920-ladsgroup.json
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47868 and previous config saved to /var/cache/conftool/dbconfig/20230508-124452-ladsgroup.json
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47867 and previous config saved to /var/cache/conftool/dbconfig/20230508-124414-ladsgroup.json
12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
12:40 topranks: rebooting cloudsw1-b1-codfw for OS upgrade T333316
12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
12:38 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47866 and previous config saved to /var/cache/conftool/dbconfig/20230508-123654-ladsgroup.json
12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47865 and previous config saved to /var/cache/conftool/dbconfig/20230508-123624-ladsgroup.json
12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47864 and previous config saved to /var/cache/conftool/dbconfig/20230508-123614-ladsgroup.json
12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47863 and previous config saved to /var/cache/conftool/dbconfig/20230508-123554-ladsgroup.json
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
12:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47862 and previous config saved to /var/cache/conftool/dbconfig/20230508-122108-ladsgroup.json
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47861 and previous config saved to /var/cache/conftool/dbconfig/20230508-122048-ladsgroup.json
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bullseye
12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47860 and previous config saved to /var/cache/conftool/dbconfig/20230508-120602-ladsgroup.json
12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47859 and previous config saved to /var/cache/conftool/dbconfig/20230508-120542-ladsgroup.json
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47858 and previous config saved to /var/cache/conftool/dbconfig/20230508-115056-ladsgroup.json
11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47857 and previous config saved to /var/cache/conftool/dbconfig/20230508-115036-ladsgroup.json
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47856 and previous config saved to /var/cache/conftool/dbconfig/20230508-114417-ladsgroup.json
11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47855 and previous config saved to /var/cache/conftool/dbconfig/20230508-114354-ladsgroup.json
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47854 and previous config saved to /var/cache/conftool/dbconfig/20230508-114336-ladsgroup.json
11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47853 and previous config saved to /var/cache/conftool/dbconfig/20230508-114312-ladsgroup.json
11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bullseye
11:32 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47851 and previous config saved to /var/cache/conftool/dbconfig/20230508-112848-ladsgroup.json
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47850 and previous config saved to /var/cache/conftool/dbconfig/20230508-112805-ladsgroup.json
11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47849 and previous config saved to /var/cache/conftool/dbconfig/20230508-111342-ladsgroup.json
11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47848 and previous config saved to /var/cache/conftool/dbconfig/20230508-111259-ladsgroup.json
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1113 from dbctl T336029', diff saved to https://phabricator.wikimedia.org/P47847 and previous config saved to /var/cache/conftool/dbconfig/20230508-111113-marostegui.json
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47846 and previous config saved to /var/cache/conftool/dbconfig/20230508-110812-root.json
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47845 and previous config saved to /var/cache/conftool/dbconfig/20230508-110803-root.json
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47844 and previous config saved to /var/cache/conftool/dbconfig/20230508-110756-root.json
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47843 and previous config saved to /var/cache/conftool/dbconfig/20230508-110755-root.json
11:04 duesen: conflig deployment failed because gitlab is down. Prod is out of sync with gerrit, and deploy1002 is in sync with gerrit. Will come back to thin in an hour.
10:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47842 and previous config saved to /var/cache/conftool/dbconfig/20230508-105835-ladsgroup.json
10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47841 and previous config saved to /var/cache/conftool/dbconfig/20230508-105753-ladsgroup.json
10:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
10:54 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47840 and previous config saved to /var/cache/conftool/dbconfig/20230508-105307-root.json
10:53 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47839 and previous config saved to /var/cache/conftool/dbconfig/20230508-105258-root.json
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47838 and previous config saved to /var/cache/conftool/dbconfig/20230508-105252-root.json
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47837 and previous config saved to /var/cache/conftool/dbconfig/20230508-105250-root.json
10:52 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47836 and previous config saved to /var/cache/conftool/dbconfig/20230508-105215-ladsgroup.json
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47835 and previous config saved to /var/cache/conftool/dbconfig/20230508-105141-ladsgroup.json
10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-replica.wikimedia.org on all recursors
10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-replica.wikimedia.org on all recursors
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47834 and previous config saved to /var/cache/conftool/dbconfig/20230508-105032-ladsgroup.json
10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab.wikimedia.org on all recursors
10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab.wikimedia.org on all recursors
10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47833 and previous config saved to /var/cache/conftool/dbconfig/20230508-105007-ladsgroup.json
10:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
10:45 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
10:44 daniel@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
10:41 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47832 and previous config saved to /var/cache/conftool/dbconfig/20230508-103802-root.json
10:37 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47831 and previous config saved to /var/cache/conftool/dbconfig/20230508-103754-root.json
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47830 and previous config saved to /var/cache/conftool/dbconfig/20230508-103747-root.json
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47829 and previous config saved to /var/cache/conftool/dbconfig/20230508-103745-root.json
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47828 and previous config saved to /var/cache/conftool/dbconfig/20230508-103634-ladsgroup.json
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47827 and previous config saved to /var/cache/conftool/dbconfig/20230508-103501-ladsgroup.json
10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
10:27 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
10:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47826 and previous config saved to /var/cache/conftool/dbconfig/20230508-102258-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47825 and previous config saved to /var/cache/conftool/dbconfig/20230508-102249-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47824 and previous config saved to /var/cache/conftool/dbconfig/20230508-102242-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47823 and previous config saved to /var/cache/conftool/dbconfig/20230508-102240-root.json
10:22 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47822 and previous config saved to /var/cache/conftool/dbconfig/20230508-102128-ladsgroup.json
10:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47821 and previous config saved to /var/cache/conftool/dbconfig/20230508-101955-ladsgroup.json
10:18 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47820 and previous config saved to /var/cache/conftool/dbconfig/20230508-100753-root.json
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47819 and previous config saved to /var/cache/conftool/dbconfig/20230508-100744-root.json
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47818 and previous config saved to /var/cache/conftool/dbconfig/20230508-100737-root.json
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47817 and previous config saved to /var/cache/conftool/dbconfig/20230508-100736-root.json
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47816 and previous config saved to /var/cache/conftool/dbconfig/20230508-100622-ladsgroup.json
10:04 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47815 and previous config saved to /var/cache/conftool/dbconfig/20230508-100449-ladsgroup.json
10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47814 and previous config saved to /var/cache/conftool/dbconfig/20230508-100003-ladsgroup.json
09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47813 and previous config saved to /var/cache/conftool/dbconfig/20230508-095928-ladsgroup.json
09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47812 and previous config saved to /var/cache/conftool/dbconfig/20230508-095724-ladsgroup.json
09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47811 and previous config saved to /var/cache/conftool/dbconfig/20230508-095659-ladsgroup.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47810 and previous config saved to /var/cache/conftool/dbconfig/20230508-095248-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47809 and previous config saved to /var/cache/conftool/dbconfig/20230508-095240-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47808 and previous config saved to /var/cache/conftool/dbconfig/20230508-095233-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47807 and previous config saved to /var/cache/conftool/dbconfig/20230508-095231-root.json
09:48 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47806 and previous config saved to /var/cache/conftool/dbconfig/20230508-094422-ladsgroup.json
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47805 and previous config saved to /var/cache/conftool/dbconfig/20230508-094153-ladsgroup.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47804 and previous config saved to /var/cache/conftool/dbconfig/20230508-093743-root.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47803 and previous config saved to /var/cache/conftool/dbconfig/20230508-093735-root.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47802 and previous config saved to /var/cache/conftool/dbconfig/20230508-093728-root.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47801 and previous config saved to /var/cache/conftool/dbconfig/20230508-093726-root.json
09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47800 and previous config saved to /var/cache/conftool/dbconfig/20230508-092916-ladsgroup.json
09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47799 and previous config saved to /var/cache/conftool/dbconfig/20230508-092647-ladsgroup.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47798 and previous config saved to /var/cache/conftool/dbconfig/20230508-092232-root.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47797 and previous config saved to /var/cache/conftool/dbconfig/20230508-092223-root.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47796 and previous config saved to /var/cache/conftool/dbconfig/20230508-092221-root.json
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47794 and previous config saved to /var/cache/conftool/dbconfig/20230508-091408-ladsgroup.json
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47793 and previous config saved to /var/cache/conftool/dbconfig/20230508-091140-ladsgroup.json
09:05 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47792 and previous config saved to /var/cache/conftool/dbconfig/20230508-090521-ladsgroup.json
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47791 and previous config saved to /var/cache/conftool/dbconfig/20230508-090456-ladsgroup.json
09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 es2025 es2022 for reboots', diff saved to https://phabricator.wikimedia.org/P47790 and previous config saved to /var/cache/conftool/dbconfig/20230508-085435-root.json
08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47789 and previous config saved to /var/cache/conftool/dbconfig/20230508-084950-ladsgroup.json
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47788 and previous config saved to /var/cache/conftool/dbconfig/20230508-083444-ladsgroup.json
08:27 vgutierrez: HAProxy updated to 2.6.13 on cp1077 and cp1085 - T334448
08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47787 and previous config saved to /var/cache/conftool/dbconfig/20230508-081937-ladsgroup.json
08:18 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
08:17 marostegui: Failover m3-master from dbproxy1020 to dbproxy1016
08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47786 and previous config saved to /var/cache/conftool/dbconfig/20230508-081415-ladsgroup.json
08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47785 and previous config saved to /var/cache/conftool/dbconfig/20230508-081353-ladsgroup.json
08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
07:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
07:59 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:51 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
08:03 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
07:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
07:44 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
07:07 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
06:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
2023-05-05
23:24 tzatziki: removing emails from 230 users per self-requests
17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
17:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
17:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
17:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
16:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
16:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
16:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
16:35 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
16:35 btullis@cumin1001: Added views for new wiki: newiki T334041
16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:28 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
16:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1023.eqiad.wmnet
16:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1024.eqiad.wmnet
16:17 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
15:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1020.eqiad.wmnet
15:41 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:41 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
15:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1019.eqiad.wmnet
15:40 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1019.eqiad.wmnet
15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1020.eqiad.wmnet
15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47778 and previous config saved to /var/cache/conftool/dbconfig/20230505-152222-ladsgroup.json
15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47777 and previous config saved to /var/cache/conftool/dbconfig/20230505-150716-ladsgroup.json
15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@11fa4e1]: (no justification provided)
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47776 and previous config saved to /var/cache/conftool/dbconfig/20230505-145209-ladsgroup.json
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47774 and previous config saved to /var/cache/conftool/dbconfig/20230505-143703-ladsgroup.json
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47773 and previous config saved to /var/cache/conftool/dbconfig/20230505-142940-ladsgroup.json
14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
14:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47772 and previous config saved to /var/cache/conftool/dbconfig/20230505-142917-ladsgroup.json
14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47771 and previous config saved to /var/cache/conftool/dbconfig/20230505-141410-ladsgroup.json
14:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
14:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
14:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host lvs2011.codfw.wmnet
13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47770 and previous config saved to /var/cache/conftool/dbconfig/20230505-135904-ladsgroup.json
13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
13:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47769 and previous config saved to /var/cache/conftool/dbconfig/20230505-134358-ladsgroup.json
13:39 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
13:39 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47768 and previous config saved to /var/cache/conftool/dbconfig/20230505-133631-ladsgroup.json
13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47767 and previous config saved to /var/cache/conftool/dbconfig/20230505-133556-ladsgroup.json
13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1005.eqiad.wmnet
13:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
13:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
13:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
13:25 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1005.eqiad.wmnet
13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1004.eqiad.wmnet
13:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
13:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47766 and previous config saved to /var/cache/conftool/dbconfig/20230505-132050-ladsgroup.json
13:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1004.eqiad.wmnet
13:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47765 and previous config saved to /var/cache/conftool/dbconfig/20230505-130544-ladsgroup.json
13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
12:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47764 and previous config saved to /var/cache/conftool/dbconfig/20230505-125038-ladsgroup.json
12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47763 and previous config saved to /var/cache/conftool/dbconfig/20230505-124412-ladsgroup.json
12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47762 and previous config saved to /var/cache/conftool/dbconfig/20230505-124349-ladsgroup.json
12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47761 and previous config saved to /var/cache/conftool/dbconfig/20230505-122843-ladsgroup.json
12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47760 and previous config saved to /var/cache/conftool/dbconfig/20230505-121336-ladsgroup.json
12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
11:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47759 and previous config saved to /var/cache/conftool/dbconfig/20230505-115830-ladsgroup.json
11:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47758 and previous config saved to /var/cache/conftool/dbconfig/20230505-115126-ladsgroup.json
11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47757 and previous config saved to /var/cache/conftool/dbconfig/20230505-112649-ladsgroup.json
11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47756 and previous config saved to /var/cache/conftool/dbconfig/20230505-112605-ladsgroup.json
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47755 and previous config saved to /var/cache/conftool/dbconfig/20230505-111145-ladsgroup.json
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47754 and previous config saved to /var/cache/conftool/dbconfig/20230505-111100-ladsgroup.json
10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47753 and previous config saved to /var/cache/conftool/dbconfig/20230505-105640-ladsgroup.json
10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47752 and previous config saved to /var/cache/conftool/dbconfig/20230505-105555-ladsgroup.json
10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47751 and previous config saved to /var/cache/conftool/dbconfig/20230505-104135-ladsgroup.json
10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47750 and previous config saved to /var/cache/conftool/dbconfig/20230505-104050-ladsgroup.json
09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
09:14 Amir1: power cycled db1170\
09:10 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015
09:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@8aba801]: deploying to host missing from configs (duration: 01m 22s)
09:06 hnowlan@deploy1002: Started deploy [restbase/deploy@8aba801]: deploying to host missing from configs
08:58 XioNoX: deploy CR914772 on all hosts running Bird
08:15 godog: delete wal and chunks_head from prometheus5002 and prometheus4002 to let prometheus start back up and not crashloop - T309979
08:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm
08:04 hashar@deploy1002: Started deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0
07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
06:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
06:51 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow2003.codfw.wmnet
06:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
06:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
06:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
06:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
06:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
06:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
06:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136907
06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow2003.codfw.wmnet on all recursors
06:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow2003.codfw.wmnet on all recursors
06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
06:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
06:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136907
06:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow2003.codfw.wmnet
06:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
06:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
06:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
06:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
06:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
06:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
05:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47748 and previous config saved to /var/cache/conftool/dbconfig/20230505-050007-ladsgroup.json
04:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47747 and previous config saved to /var/cache/conftool/dbconfig/20230505-044500-ladsgroup.json
04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47746 and previous config saved to /var/cache/conftool/dbconfig/20230505-042954-ladsgroup.json
04:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
04:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47745 and previous config saved to /var/cache/conftool/dbconfig/20230505-041448-ladsgroup.json
04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47744 and previous config saved to /var/cache/conftool/dbconfig/20230505-040837-ladsgroup.json
04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47743 and previous config saved to /var/cache/conftool/dbconfig/20230505-040812-ladsgroup.json
04:04 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47742 and previous config saved to /var/cache/conftool/dbconfig/20230505-035306-ladsgroup.json
03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47741 and previous config saved to /var/cache/conftool/dbconfig/20230505-033800-ladsgroup.json
03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47740 and previous config saved to /var/cache/conftool/dbconfig/20230505-032253-ladsgroup.json
03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47739 and previous config saved to /var/cache/conftool/dbconfig/20230505-031637-ladsgroup.json
03:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47738 and previous config saved to /var/cache/conftool/dbconfig/20230505-030130-ladsgroup.json
02:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47737 and previous config saved to /var/cache/conftool/dbconfig/20230505-024624-ladsgroup.json
02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47736 and previous config saved to /var/cache/conftool/dbconfig/20230505-023118-ladsgroup.json
02:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47735 and previous config saved to /var/cache/conftool/dbconfig/20230505-022510-ladsgroup.json
02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47734 and previous config saved to /var/cache/conftool/dbconfig/20230505-022446-ladsgroup.json
02:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
02:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47733 and previous config saved to /var/cache/conftool/dbconfig/20230505-022421-ladsgroup.json
02:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47732 and previous config saved to /var/cache/conftool/dbconfig/20230505-020915-ladsgroup.json
01:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47731 and previous config saved to /var/cache/conftool/dbconfig/20230505-015409-ladsgroup.json
01:49 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
01:45 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus6002.drmrs.wmnet
01:41 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
01:40 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
01:39 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus6002.drmrs.wmnet
01:39 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus5002.eqsin.wmnet
01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47730 and previous config saved to /var/cache/conftool/dbconfig/20230505-013903-ladsgroup.json
01:32 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus5002.eqsin.wmnet
01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47729 and previous config saved to /var/cache/conftool/dbconfig/20230505-013232-ladsgroup.json
01:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
01:32 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus4002.ulsfo.wmnet
01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47728 and previous config saved to /var/cache/conftool/dbconfig/20230505-013206-ladsgroup.json
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47727 and previous config saved to /var/cache/conftool/dbconfig/20230505-013108-ladsgroup.json
01:31 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
01:30 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47726 and previous config saved to /var/cache/conftool/dbconfig/20230505-012950-ladsgroup.json
01:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
01:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47725 and previous config saved to /var/cache/conftool/dbconfig/20230505-012927-ladsgroup.json
01:26 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus4002.ulsfo.wmnet
01:25 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus3002.esams.wmnet
01:21 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
01:20 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
01:18 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus3002.esams.wmnet
01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47724 and previous config saved to /var/cache/conftool/dbconfig/20230505-011700-ladsgroup.json
01:16 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47723 and previous config saved to /var/cache/conftool/dbconfig/20230505-011421-ladsgroup.json
01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47722 and previous config saved to /var/cache/conftool/dbconfig/20230505-010154-ladsgroup.json
00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47721 and previous config saved to /var/cache/conftool/dbconfig/20230505-005914-ladsgroup.json
00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47720 and previous config saved to /var/cache/conftool/dbconfig/20230505-004648-ladsgroup.json
00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47719 and previous config saved to /var/cache/conftool/dbconfig/20230505-004408-ladsgroup.json
00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47718 and previous config saved to /var/cache/conftool/dbconfig/20230505-003914-ladsgroup.json
00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47717 and previous config saved to /var/cache/conftool/dbconfig/20230505-003845-ladsgroup.json
00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47716 and previous config saved to /var/cache/conftool/dbconfig/20230505-003749-ladsgroup.json
00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47715 and previous config saved to /var/cache/conftool/dbconfig/20230505-003359-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47714 and previous config saved to /var/cache/conftool/dbconfig/20230505-002339-ladsgroup.json
00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47713 and previous config saved to /var/cache/conftool/dbconfig/20230505-001853-ladsgroup.json
00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47712 and previous config saved to /var/cache/conftool/dbconfig/20230505-000832-ladsgroup.json
00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47711 and previous config saved to /var/cache/conftool/dbconfig/20230505-000346-ladsgroup.json
2023-05-04
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47710 and previous config saved to /var/cache/conftool/dbconfig/20230504-235326-ladsgroup.json
23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47709 and previous config saved to /var/cache/conftool/dbconfig/20230504-234840-ladsgroup.json
23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47708 and previous config saved to /var/cache/conftool/dbconfig/20230504-234544-ladsgroup.json
23:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47707 and previous config saved to /var/cache/conftool/dbconfig/20230504-234520-ladsgroup.json
23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47706 and previous config saved to /var/cache/conftool/dbconfig/20230504-234330-ladsgroup.json
23:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
23:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47705 and previous config saved to /var/cache/conftool/dbconfig/20230504-234306-ladsgroup.json
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47704 and previous config saved to /var/cache/conftool/dbconfig/20230504-233013-ladsgroup.json
23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47703 and previous config saved to /var/cache/conftool/dbconfig/20230504-232800-ladsgroup.json
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47702 and previous config saved to /var/cache/conftool/dbconfig/20230504-231507-ladsgroup.json
23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47701 and previous config saved to /var/cache/conftool/dbconfig/20230504-231254-ladsgroup.json
23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47700 and previous config saved to /var/cache/conftool/dbconfig/20230504-230001-ladsgroup.json
22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47699 and previous config saved to /var/cache/conftool/dbconfig/20230504-225747-ladsgroup.json
22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47698 and previous config saved to /var/cache/conftool/dbconfig/20230504-225336-ladsgroup.json
22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47697 and previous config saved to /var/cache/conftool/dbconfig/20230504-225013-ladsgroup.json
22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47696 and previous config saved to /var/cache/conftool/dbconfig/20230504-224646-ladsgroup.json
22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47695 and previous config saved to /var/cache/conftool/dbconfig/20230504-223139-ladsgroup.json
22:22 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47694 and previous config saved to /var/cache/conftool/dbconfig/20230504-221633-ladsgroup.json
22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47693 and previous config saved to /var/cache/conftool/dbconfig/20230504-220127-ladsgroup.json
21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47692 and previous config saved to /var/cache/conftool/dbconfig/20230504-215511-ladsgroup.json
21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47691 and previous config saved to /var/cache/conftool/dbconfig/20230504-215447-ladsgroup.json
21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47690 and previous config saved to /var/cache/conftool/dbconfig/20230504-213941-ladsgroup.json
21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47689 and previous config saved to /var/cache/conftool/dbconfig/20230504-212434-ladsgroup.json
21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47688 and previous config saved to /var/cache/conftool/dbconfig/20230504-210928-ladsgroup.json
21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47687 and previous config saved to /var/cache/conftool/dbconfig/20230504-210513-ladsgroup.json
21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47686 and previous config saved to /var/cache/conftool/dbconfig/20230504-210057-ladsgroup.json
21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47685 and previous config saved to /var/cache/conftool/dbconfig/20230504-210033-ladsgroup.json
20:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47684 and previous config saved to /var/cache/conftool/dbconfig/20230504-205007-ladsgroup.json
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47683 and previous config saved to /var/cache/conftool/dbconfig/20230504-204527-ladsgroup.json
20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47682 and previous config saved to /var/cache/conftool/dbconfig/20230504-203501-ladsgroup.json
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47681 and previous config saved to /var/cache/conftool/dbconfig/20230504-203021-ladsgroup.json
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47680 and previous config saved to /var/cache/conftool/dbconfig/20230504-201955-ladsgroup.json
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47679 and previous config saved to /var/cache/conftool/dbconfig/20230504-201514-ladsgroup.json
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47678 and previous config saved to /var/cache/conftool/dbconfig/20230504-201332-ladsgroup.json
20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47677 and previous config saved to /var/cache/conftool/dbconfig/20230504-201306-ladsgroup.json
20:08 brennen@deploy1002: brennen and jdlrobson: Backport for Fix file page integration (T335997) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
20:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47675 and previous config saved to /var/cache/conftool/dbconfig/20230504-200141-ladsgroup.json
20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47674 and previous config saved to /var/cache/conftool/dbconfig/20230504-200131-ladsgroup.json
20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
20:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
20:00 mutante: people2002 (people.wikimedia.org) reboot, <1 min downtime
19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47673 and previous config saved to /var/cache/conftool/dbconfig/20230504-195800-ladsgroup.json
19:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47672 and previous config saved to /var/cache/conftool/dbconfig/20230504-194635-ladsgroup.json
19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47671 and previous config saved to /var/cache/conftool/dbconfig/20230504-194624-ladsgroup.json
19:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47670 and previous config saved to /var/cache/conftool/dbconfig/20230504-194254-ladsgroup.json
19:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47669 and previous config saved to /var/cache/conftool/dbconfig/20230504-193129-ladsgroup.json
19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47668 and previous config saved to /var/cache/conftool/dbconfig/20230504-193118-ladsgroup.json
19:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
19:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47667 and previous config saved to /var/cache/conftool/dbconfig/20230504-192747-ladsgroup.json
19:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
19:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47666 and previous config saved to /var/cache/conftool/dbconfig/20230504-191623-ladsgroup.json
19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47665 and previous config saved to /var/cache/conftool/dbconfig/20230504-191612-ladsgroup.json
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47664 and previous config saved to /var/cache/conftool/dbconfig/20230504-191528-ladsgroup.json
19:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
19:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47663 and previous config saved to /var/cache/conftool/dbconfig/20230504-191001-ladsgroup.json
19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47662 and previous config saved to /var/cache/conftool/dbconfig/20230504-190937-ladsgroup.json
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47661 and previous config saved to /var/cache/conftool/dbconfig/20230504-190757-ladsgroup.json
19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
19:02 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47660 and previous config saved to /var/cache/conftool/dbconfig/20230504-190022-ladsgroup.json
18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47659 and previous config saved to /var/cache/conftool/dbconfig/20230504-185431-ladsgroup.json
18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47658 and previous config saved to /var/cache/conftool/dbconfig/20230504-185250-ladsgroup.json
18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
18:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47657 and previous config saved to /var/cache/conftool/dbconfig/20230504-184516-ladsgroup.json
18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47656 and previous config saved to /var/cache/conftool/dbconfig/20230504-183925-ladsgroup.json
18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47655 and previous config saved to /var/cache/conftool/dbconfig/20230504-183744-ladsgroup.json
18:37 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
18:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
18:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47654 and previous config saved to /var/cache/conftool/dbconfig/20230504-183010-ladsgroup.json
18:28 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
18:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47653 and previous config saved to /var/cache/conftool/dbconfig/20230504-182418-ladsgroup.json
18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47652 and previous config saved to /var/cache/conftool/dbconfig/20230504-182301-ladsgroup.json
18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47651 and previous config saved to /var/cache/conftool/dbconfig/20230504-182238-ladsgroup.json
18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47650 and previous config saved to /var/cache/conftool/dbconfig/20230504-182139-ladsgroup.json
18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47649 and previous config saved to /var/cache/conftool/dbconfig/20230504-182114-ladsgroup.json
18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47648 and previous config saved to /var/cache/conftool/dbconfig/20230504-181851-ladsgroup.json
18:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
18:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47647 and previous config saved to /var/cache/conftool/dbconfig/20230504-181828-ladsgroup.json
18:17 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47646 and previous config saved to /var/cache/conftool/dbconfig/20230504-181636-ladsgroup.json
18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330213
18:15 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47645 and previous config saved to /var/cache/conftool/dbconfig/20230504-181516-ladsgroup.json
18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
18:14 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47644 and previous config saved to /var/cache/conftool/dbconfig/20230504-181451-ladsgroup.json
18:11 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
18:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
18:08 brennen: train 1.41.0-wmf.7 (T330213): logs fairly quiet and no current blockers, rolling to group2
18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47643 and previous config saved to /var/cache/conftool/dbconfig/20230504-180608-ladsgroup.json
18:05 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
18:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
18:04 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47642 and previous config saved to /var/cache/conftool/dbconfig/20230504-180322-ladsgroup.json
17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47641 and previous config saved to /var/cache/conftool/dbconfig/20230504-175945-ladsgroup.json
17:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
17:54 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
17:53 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47640 and previous config saved to /var/cache/conftool/dbconfig/20230504-175102-ladsgroup.json
17:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
17:48 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47639 and previous config saved to /var/cache/conftool/dbconfig/20230504-174815-ladsgroup.json
17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47638 and previous config saved to /var/cache/conftool/dbconfig/20230504-174438-ladsgroup.json
17:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
17:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47637 and previous config saved to /var/cache/conftool/dbconfig/20230504-174040-ladsgroup.json
17:37 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47635 and previous config saved to /var/cache/conftool/dbconfig/20230504-173555-ladsgroup.json
17:35 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47634 and previous config saved to /var/cache/conftool/dbconfig/20230504-173309-ladsgroup.json
17:32 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
17:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
17:32 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
17:31 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
17:31 mutante: people1003 - rebooting
17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
17:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
17:30 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47633 and previous config saved to /var/cache/conftool/dbconfig/20230504-172932-ladsgroup.json
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47632 and previous config saved to /var/cache/conftool/dbconfig/20230504-172835-ladsgroup.json
17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47631 and previous config saved to /var/cache/conftool/dbconfig/20230504-172806-ladsgroup.json
17:26 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
17:25 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47630 and previous config saved to /var/cache/conftool/dbconfig/20230504-172546-ladsgroup.json
17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47629 and previous config saved to /var/cache/conftool/dbconfig/20230504-172534-ladsgroup.json
17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47628 and previous config saved to /var/cache/conftool/dbconfig/20230504-172523-ladsgroup.json
17:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47627 and previous config saved to /var/cache/conftool/dbconfig/20230504-172228-ladsgroup.json
17:22 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[11-21].eqiad.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
17:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
17:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47626 and previous config saved to /var/cache/conftool/dbconfig/20230504-172204-ladsgroup.json
17:16 mutante: aphlict2001 - not active, rebooting
17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47625 and previous config saved to /var/cache/conftool/dbconfig/20230504-171300-ladsgroup.json
17:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47624 and previous config saved to /var/cache/conftool/dbconfig/20230504-171028-ladsgroup.json
17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47623 and previous config saved to /var/cache/conftool/dbconfig/20230504-171017-ladsgroup.json
17:09 brennen: phab1004 deployed and restarted, phab up, MR widget still seems to work
17:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
17:08 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 34s)
17:07 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004
17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47622 and previous config saved to /var/cache/conftool/dbconfig/20230504-170658-ladsgroup.json
17:05 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab2002 (duration: 00m 37s)
17:05 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab2002
17:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
17:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
17:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
17:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
17:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
16:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47621 and previous config saved to /var/cache/conftool/dbconfig/20230504-165753-ladsgroup.json
16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47620 and previous config saved to /var/cache/conftool/dbconfig/20230504-165521-ladsgroup.json
16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47619 and previous config saved to /var/cache/conftool/dbconfig/20230504-165511-ladsgroup.json
16:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47618 and previous config saved to /var/cache/conftool/dbconfig/20230504-165152-ladsgroup.json
16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47617 and previous config saved to /var/cache/conftool/dbconfig/20230504-164850-ladsgroup.json
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47616 and previous config saved to /var/cache/conftool/dbconfig/20230504-164826-ladsgroup.json
16:46 sbassett@deploy1002: sbassett: Backport for Re-enable the Graph extension on test2wiki (T334940) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
16:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47615 and previous config saved to /var/cache/conftool/dbconfig/20230504-164247-ladsgroup.json
16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47614 and previous config saved to /var/cache/conftool/dbconfig/20230504-164004-ladsgroup.json
16:39 jynus: extending logical volume of backup1003, backup2003 for backup storage
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47613 and previous config saved to /var/cache/conftool/dbconfig/20230504-163646-ladsgroup.json
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47612 and previous config saved to /var/cache/conftool/dbconfig/20230504-163626-ladsgroup.json
16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
16:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47611 and previous config saved to /var/cache/conftool/dbconfig/20230504-163601-ladsgroup.json
16:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
16:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47610 and previous config saved to /var/cache/conftool/dbconfig/20230504-163319-ladsgroup.json
16:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47609 and previous config saved to /var/cache/conftool/dbconfig/20230504-163149-ladsgroup.json
16:30 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 152m 23s)
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47608 and previous config saved to /var/cache/conftool/dbconfig/20230504-162926-ladsgroup.json
16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47607 and previous config saved to /var/cache/conftool/dbconfig/20230504-162902-ladsgroup.json
16:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
16:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
16:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
16:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2002.codfw.wmnet
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47606 and previous config saved to /var/cache/conftool/dbconfig/20230504-162055-ladsgroup.json
16:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47605 and previous config saved to /var/cache/conftool/dbconfig/20230504-161813-ladsgroup.json
16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47604 and previous config saved to /var/cache/conftool/dbconfig/20230504-161643-ladsgroup.json
16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2002.codfw.wmnet
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47603 and previous config saved to /var/cache/conftool/dbconfig/20230504-161356-ladsgroup.json
16:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
16:12 mutante: doc1003 - rebooting
16:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
16:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
16:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
16:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
16:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47602 and previous config saved to /var/cache/conftool/dbconfig/20230504-160547-ladsgroup.json
16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47601 and previous config saved to /var/cache/conftool/dbconfig/20230504-160307-ladsgroup.json
16:02 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47600 and previous config saved to /var/cache/conftool/dbconfig/20230504-160136-ladsgroup.json
16:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47599 and previous config saved to /var/cache/conftool/dbconfig/20230504-155850-ladsgroup.json
15:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47598 and previous config saved to /var/cache/conftool/dbconfig/20230504-155544-ladsgroup.json
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47597 and previous config saved to /var/cache/conftool/dbconfig/20230504-155518-ladsgroup.json
15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
15:54 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
15:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
15:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47596 and previous config saved to /var/cache/conftool/dbconfig/20230504-155041-ladsgroup.json
15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
15:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47595 and previous config saved to /var/cache/conftool/dbconfig/20230504-154630-ladsgroup.json
15:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47594 and previous config saved to /var/cache/conftool/dbconfig/20230504-154344-ladsgroup.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47593 and previous config saved to /var/cache/conftool/dbconfig/20230504-154211-ladsgroup.json
15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47592 and previous config saved to /var/cache/conftool/dbconfig/20230504-154146-ladsgroup.json
15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47591 and previous config saved to /var/cache/conftool/dbconfig/20230504-154021-ladsgroup.json
15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47590 and previous config saved to /var/cache/conftool/dbconfig/20230504-154012-ladsgroup.json
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47589 and previous config saved to /var/cache/conftool/dbconfig/20230504-153850-ladsgroup.json
15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47588 and previous config saved to /var/cache/conftool/dbconfig/20230504-153834-ladsgroup.json
15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47587 and previous config saved to /var/cache/conftool/dbconfig/20230504-153825-ladsgroup.json
15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47586 and previous config saved to /var/cache/conftool/dbconfig/20230504-153810-ladsgroup.json
15:38 mutante: doc2002 - rebooting
15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
15:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
15:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47585 and previous config saved to /var/cache/conftool/dbconfig/20230504-152640-ladsgroup.json
15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47584 and previous config saved to /var/cache/conftool/dbconfig/20230504-152506-ladsgroup.json
15:24 marostegui: Failover m1-master from dbproxy1012 to dbproxy1014
15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47583 and previous config saved to /var/cache/conftool/dbconfig/20230504-152319-ladsgroup.json
15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47582 and previous config saved to /var/cache/conftool/dbconfig/20230504-152304-ladsgroup.json
15:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47581 and previous config saved to /var/cache/conftool/dbconfig/20230504-151133-ladsgroup.json
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47580 and previous config saved to /var/cache/conftool/dbconfig/20230504-151000-ladsgroup.json
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47579 and previous config saved to /var/cache/conftool/dbconfig/20230504-150813-ladsgroup.json
15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47578 and previous config saved to /var/cache/conftool/dbconfig/20230504-150758-ladsgroup.json
15:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47576 and previous config saved to /var/cache/conftool/dbconfig/20230504-150336-ladsgroup.json
15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47575 and previous config saved to /var/cache/conftool/dbconfig/20230504-150307-ladsgroup.json
14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47574 and previous config saved to /var/cache/conftool/dbconfig/20230504-145627-ladsgroup.json
14:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47573 and previous config saved to /var/cache/conftool/dbconfig/20230504-145307-ladsgroup.json
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47572 and previous config saved to /var/cache/conftool/dbconfig/20230504-145251-ladsgroup.json
14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47571 and previous config saved to /var/cache/conftool/dbconfig/20230504-145153-ladsgroup.json
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47570 and previous config saved to /var/cache/conftool/dbconfig/20230504-144852-ladsgroup.json
14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47569 and previous config saved to /var/cache/conftool/dbconfig/20230504-144827-ladsgroup.json
14:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47568 and previous config saved to /var/cache/conftool/dbconfig/20230504-144801-ladsgroup.json
14:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47567 and previous config saved to /var/cache/conftool/dbconfig/20230504-144625-ladsgroup.json
14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
14:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47566 and previous config saved to /var/cache/conftool/dbconfig/20230504-144110-ladsgroup.json
14:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
14:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
14:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47565 and previous config saved to /var/cache/conftool/dbconfig/20230504-143647-ladsgroup.json
14:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
14:34 eevans@cumin1001: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47564 and previous config saved to /var/cache/conftool/dbconfig/20230504-143320-ladsgroup.json
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47563 and previous config saved to /var/cache/conftool/dbconfig/20230504-143255-ladsgroup.json
14:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
14:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47562 and previous config saved to /var/cache/conftool/dbconfig/20230504-142604-ladsgroup.json
14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47561 and previous config saved to /var/cache/conftool/dbconfig/20230504-142140-ladsgroup.json
14:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47560 and previous config saved to /var/cache/conftool/dbconfig/20230504-141814-ladsgroup.json
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47559 and previous config saved to /var/cache/conftool/dbconfig/20230504-141749-ladsgroup.json
14:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1004.eqiad.wmnet
14:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
14:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1004.eqiad.wmnet
14:12 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1005.eqiad.wmnet
14:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47558 and previous config saved to /var/cache/conftool/dbconfig/20230504-141057-ladsgroup.json
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47557 and previous config saved to /var/cache/conftool/dbconfig/20230504-141024-ladsgroup.json
14:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
14:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47556 and previous config saved to /var/cache/conftool/dbconfig/20230504-140958-ladsgroup.json
14:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
14:08 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1005.eqiad.wmnet
14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1006.eqiad.wmnet
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47555 and previous config saved to /var/cache/conftool/dbconfig/20230504-140634-ladsgroup.json
14:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
14:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1006.eqiad.wmnet
14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47554 and previous config saved to /var/cache/conftool/dbconfig/20230504-140308-ladsgroup.json
14:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47553 and previous config saved to /var/cache/conftool/dbconfig/20230504-140012-ladsgroup.json
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47552 and previous config saved to /var/cache/conftool/dbconfig/20230504-135845-ladsgroup.json
13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47551 and previous config saved to /var/cache/conftool/dbconfig/20230504-135821-ladsgroup.json
13:58 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
13:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47550 and previous config saved to /var/cache/conftool/dbconfig/20230504-135637-ladsgroup.json
13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47549 and previous config saved to /var/cache/conftool/dbconfig/20230504-135612-ladsgroup.json
13:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47548 and previous config saved to /var/cache/conftool/dbconfig/20230504-135551-ladsgroup.json
13:54 Lucas_WMDE: UTC afternoon backport+config window done
13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47547 and previous config saved to /var/cache/conftool/dbconfig/20230504-135452-ladsgroup.json
13:53 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47546 and previous config saved to /var/cache/conftool/dbconfig/20230504-135135-ladsgroup.json
13:48 herron: switching to bullseye kafka monitoring hosts T335424
13:48 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1004.eqiad.wmnet
13:47 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:43 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1004.eqiad.wmnet
13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47545 and previous config saved to /var/cache/conftool/dbconfig/20230504-134315-ladsgroup.json
13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1005.eqiad.wmnet
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47544 and previous config saved to /var/cache/conftool/dbconfig/20230504-134106-ladsgroup.json
13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47543 and previous config saved to /var/cache/conftool/dbconfig/20230504-133945-ladsgroup.json
13:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1005.eqiad.wmnet
13:38 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:37 elukey: revert "Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733"
13:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1006.eqiad.wmnet
13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47542 and previous config saved to /var/cache/conftool/dbconfig/20230504-133628-ladsgroup.json
13:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2002.codfw.wmnet
13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47541 and previous config saved to /var/cache/conftool/dbconfig/20230504-132809-ladsgroup.json
13:27 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2002.codfw.wmnet
13:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2003.codfw.wmnet
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47540 and previous config saved to /var/cache/conftool/dbconfig/20230504-132600-ladsgroup.json
13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47539 and previous config saved to /var/cache/conftool/dbconfig/20230504-132439-ladsgroup.json
13:23 jdrewniak@deploy1002: jdrewniak: Backport for Enable Vector 2022 as the default skin on eswiki (T335686) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:23 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2003.codfw.wmnet
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47538 and previous config saved to /var/cache/conftool/dbconfig/20230504-132122-ladsgroup.json
13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47537 and previous config saved to /var/cache/conftool/dbconfig/20230504-131621-ladsgroup.json
13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47536 and previous config saved to /var/cache/conftool/dbconfig/20230504-131302-ladsgroup.json
13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47535 and previous config saved to /var/cache/conftool/dbconfig/20230504-131054-ladsgroup.json
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47534 and previous config saved to /var/cache/conftool/dbconfig/20230504-130616-ladsgroup.json
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47533 and previous config saved to /var/cache/conftool/dbconfig/20230504-130432-ladsgroup.json
13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47532 and previous config saved to /var/cache/conftool/dbconfig/20230504-130115-ladsgroup.json
12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47531 and previous config saved to /var/cache/conftool/dbconfig/20230504-125309-ladsgroup.json
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47530 and previous config saved to /var/cache/conftool/dbconfig/20230504-125250-ladsgroup.json
12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47529 and previous config saved to /var/cache/conftool/dbconfig/20230504-124609-ladsgroup.json
12:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
12:38 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47528 and previous config saved to /var/cache/conftool/dbconfig/20230504-123103-ladsgroup.json
12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47527 and previous config saved to /var/cache/conftool/dbconfig/20230504-122237-ladsgroup.json
12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47526 and previous config saved to /var/cache/conftool/dbconfig/20230504-122114-ladsgroup.json
12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47525 and previous config saved to /var/cache/conftool/dbconfig/20230504-122048-ladsgroup.json
12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47524 and previous config saved to /var/cache/conftool/dbconfig/20230504-121247-ladsgroup.json
12:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47523 and previous config saved to /var/cache/conftool/dbconfig/20230504-121224-ladsgroup.json
12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47522 and previous config saved to /var/cache/conftool/dbconfig/20230504-120542-ladsgroup.json
12:04 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47521 and previous config saved to /var/cache/conftool/dbconfig/20230504-115717-ladsgroup.json
11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47520 and previous config saved to /var/cache/conftool/dbconfig/20230504-115035-ladsgroup.json
11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47519 and previous config saved to /var/cache/conftool/dbconfig/20230504-114211-ladsgroup.json
11:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
11:38 kart_: Updated cxserver to 2023-05-03-044244-production (T333835, T335019, T331505)
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47518 and previous config saved to /var/cache/conftool/dbconfig/20230504-113529-ladsgroup.json
11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47516 and previous config saved to /var/cache/conftool/dbconfig/20230504-112705-ladsgroup.json
11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47515 and previous config saved to /var/cache/conftool/dbconfig/20230504-112650-ladsgroup.json
11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47514 and previous config saved to /var/cache/conftool/dbconfig/20230504-112625-ladsgroup.json
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47513 and previous config saved to /var/cache/conftool/dbconfig/20230504-112041-ladsgroup.json
11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47512 and previous config saved to /var/cache/conftool/dbconfig/20230504-112017-ladsgroup.json
11:15 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bullseye
11:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47511 and previous config saved to /var/cache/conftool/dbconfig/20230504-111119-ladsgroup.json
11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47510 and previous config saved to /var/cache/conftool/dbconfig/20230504-110511-ladsgroup.json
11:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5713
11:04 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
11:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5713
11:01 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47509 and previous config saved to /var/cache/conftool/dbconfig/20230504-105613-ladsgroup.json
10:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
10:54 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47508 and previous config saved to /var/cache/conftool/dbconfig/20230504-105005-ladsgroup.json
10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
10:48 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict2001.codfw.wmnet with OS bullseye
10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3002.wikimedia.org
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47507 and previous config saved to /var/cache/conftool/dbconfig/20230504-104107-ladsgroup.json
10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3002.wikimedia.org
10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
10:35 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47506 and previous config saved to /var/cache/conftool/dbconfig/20230504-103459-ladsgroup.json
10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47505 and previous config saved to /var/cache/conftool/dbconfig/20230504-103434-ladsgroup.json
10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47504 and previous config saved to /var/cache/conftool/dbconfig/20230504-103409-ladsgroup.json
10:28 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47503 and previous config saved to /var/cache/conftool/dbconfig/20230504-102835-ladsgroup.json
10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47502 and previous config saved to /var/cache/conftool/dbconfig/20230504-102812-ladsgroup.json
10:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
10:23 Amir1: Removing db1114 from zarcillo T335837
10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1114.eqiad.wmnet
10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:19 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47501 and previous config saved to /var/cache/conftool/dbconfig/20230504-101903-ladsgroup.json
10:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47500 and previous config saved to /var/cache/conftool/dbconfig/20230504-101306-ladsgroup.json
10:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1114.eqiad.wmnet
10:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47499 and previous config saved to /var/cache/conftool/dbconfig/20230504-100357-ladsgroup.json
10:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47498 and previous config saved to /var/cache/conftool/dbconfig/20230504-095800-ladsgroup.json
09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1114 from dbctl T335837', diff saved to https://phabricator.wikimedia.org/P47497 and previous config saved to /var/cache/conftool/dbconfig/20230504-094945-ladsgroup.json
09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47496 and previous config saved to /var/cache/conftool/dbconfig/20230504-094850-ladsgroup.json
09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47495 and previous config saved to /var/cache/conftool/dbconfig/20230504-094253-ladsgroup.json
09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47494 and previous config saved to /var/cache/conftool/dbconfig/20230504-094221-ladsgroup.json
09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47493 and previous config saved to /var/cache/conftool/dbconfig/20230504-094156-ladsgroup.json
09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47492 and previous config saved to /var/cache/conftool/dbconfig/20230504-093733-ladsgroup.json
09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47491 and previous config saved to /var/cache/conftool/dbconfig/20230504-093710-ladsgroup.json
09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1114 T335837', diff saved to https://phabricator.wikimedia.org/P47490 and previous config saved to /var/cache/conftool/dbconfig/20230504-093419-ladsgroup.json
09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47488 and previous config saved to /var/cache/conftool/dbconfig/20230504-092649-ladsgroup.json
09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47487 and previous config saved to /var/cache/conftool/dbconfig/20230504-092203-ladsgroup.json
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
09:12 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47486 and previous config saved to /var/cache/conftool/dbconfig/20230504-091143-ladsgroup.json
09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47485 and previous config saved to /var/cache/conftool/dbconfig/20230504-090657-ladsgroup.json
08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47484 and previous config saved to /var/cache/conftool/dbconfig/20230504-085637-ladsgroup.json
08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47483 and previous config saved to /var/cache/conftool/dbconfig/20230504-085151-ladsgroup.json
08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47482 and previous config saved to /var/cache/conftool/dbconfig/20230504-085008-ladsgroup.json
08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47481 and previous config saved to /var/cache/conftool/dbconfig/20230504-084741-ladsgroup.json
08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bookworm
07:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
06:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4004.wikimedia.org
06:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
06:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts test-reimage2001.codfw.wmnet
06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
06:07 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
04:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
04:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
04:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T335835
04:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
04:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
04:38 ryankemper: [Elastic] Reboot operation failed w/ (likely transient) read timeouts, will try again in 10 mins
04:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
04:36 ryankemper: [Elastic] Beginning rolling reboot of eqiad elastic, 3 nodes at a time, `ryankemper@cumin1001` tmux session `reboot_eqiad`
04:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
04:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
02:42 eileen: config revision changed from 5ac52d82 to 7ac11236 reduce batch size, avoid failmail
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47480 and previous config saved to /var/cache/conftool/dbconfig/20230503-203424-ladsgroup.json
20:33 cjming@deploy1002: jdlrobson and cjming: Backport for Explicitly enable MFCustomSiteModules (T270603) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:23 cjming@deploy1002: cjming and jdlrobson: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47479 and previous config saved to /var/cache/conftool/dbconfig/20230503-201918-ladsgroup.json
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47478 and previous config saved to /var/cache/conftool/dbconfig/20230503-200411-ladsgroup.json
19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47477 and previous config saved to /var/cache/conftool/dbconfig/20230503-194905-ladsgroup.json
19:43 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T335835
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47476 and previous config saved to /var/cache/conftool/dbconfig/20230503-194238-ladsgroup.json
19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47475 and previous config saved to /var/cache/conftool/dbconfig/20230503-194213-ladsgroup.json
19:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47474 and previous config saved to /var/cache/conftool/dbconfig/20230503-194045-ladsgroup.json
19:37 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47473 and previous config saved to /var/cache/conftool/dbconfig/20230503-192707-ladsgroup.json
19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47472 and previous config saved to /var/cache/conftool/dbconfig/20230503-192538-ladsgroup.json
19:20 inflatador: bking@cumin1001 reboot Elastic cluster for T335835
19:19 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47471 and previous config saved to /var/cache/conftool/dbconfig/20230503-191200-ladsgroup.json
19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47470 and previous config saved to /var/cache/conftool/dbconfig/20230503-191032-ladsgroup.json
19:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47469 and previous config saved to /var/cache/conftool/dbconfig/20230503-185654-ladsgroup.json
18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47468 and previous config saved to /var/cache/conftool/dbconfig/20230503-185526-ladsgroup.json
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47467 and previous config saved to /var/cache/conftool/dbconfig/20230503-185026-ladsgroup.json
18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47466 and previous config saved to /var/cache/conftool/dbconfig/20230503-184957-ladsgroup.json
18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47465 and previous config saved to /var/cache/conftool/dbconfig/20230503-184610-ladsgroup.json
18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47464 and previous config saved to /var/cache/conftool/dbconfig/20230503-184536-ladsgroup.json
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47463 and previous config saved to /var/cache/conftool/dbconfig/20230503-183451-ladsgroup.json
18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47462 and previous config saved to /var/cache/conftool/dbconfig/20230503-183030-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47461 and previous config saved to /var/cache/conftool/dbconfig/20230503-181944-ladsgroup.json
18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47460 and previous config saved to /var/cache/conftool/dbconfig/20230503-181524-ladsgroup.json
18:08 brennen: train 1.41.0-wmf.7 (T330213): logs quiet and no current blockers, rolling to group1
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47459 and previous config saved to /var/cache/conftool/dbconfig/20230503-180438-ladsgroup.json
18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47458 and previous config saved to /var/cache/conftool/dbconfig/20230503-180018-ladsgroup.json
17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47457 and previous config saved to /var/cache/conftool/dbconfig/20230503-175806-ladsgroup.json
17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47456 and previous config saved to /var/cache/conftool/dbconfig/20230503-175404-ladsgroup.json
17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47455 and previous config saved to /var/cache/conftool/dbconfig/20230503-175340-ladsgroup.json
17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47454 and previous config saved to /var/cache/conftool/dbconfig/20230503-175126-ladsgroup.json
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47453 and previous config saved to /var/cache/conftool/dbconfig/20230503-174330-ladsgroup.json
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47452 and previous config saved to /var/cache/conftool/dbconfig/20230503-173834-ladsgroup.json
17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47451 and previous config saved to /var/cache/conftool/dbconfig/20230503-173620-ladsgroup.json
17:32 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS bullseye
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47450 and previous config saved to /var/cache/conftool/dbconfig/20230503-172824-ladsgroup.json
17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47449 and previous config saved to /var/cache/conftool/dbconfig/20230503-172328-ladsgroup.json
17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47448 and previous config saved to /var/cache/conftool/dbconfig/20230503-172114-ladsgroup.json
17:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
17:15 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47447 and previous config saved to /var/cache/conftool/dbconfig/20230503-171317-ladsgroup.json
17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47446 and previous config saved to /var/cache/conftool/dbconfig/20230503-170821-ladsgroup.json
17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47445 and previous config saved to /var/cache/conftool/dbconfig/20230503-170607-ladsgroup.json
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
17:05 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 169m 01s)
17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47444 and previous config saved to /var/cache/conftool/dbconfig/20230503-165954-ladsgroup.json
16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47443 and previous config saved to /var/cache/conftool/dbconfig/20230503-165920-ladsgroup.json
16:58 herron@cumin1001: START - Cookbook sre.ganeti.reimage for host kafkamon2003.codfw.wmnet with OS bullseye
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47442 and previous config saved to /var/cache/conftool/dbconfig/20230503-165818-ladsgroup.json
16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47441 and previous config saved to /var/cache/conftool/dbconfig/20230503-165811-ladsgroup.json
16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47440 and previous config saved to /var/cache/conftool/dbconfig/20230503-165754-ladsgroup.json
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47438 and previous config saved to /var/cache/conftool/dbconfig/20230503-164622-ladsgroup.json
16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47437 and previous config saved to /var/cache/conftool/dbconfig/20230503-164557-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47436 and previous config saved to /var/cache/conftool/dbconfig/20230503-164414-ladsgroup.json
16:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47435 and previous config saved to /var/cache/conftool/dbconfig/20230503-164248-ladsgroup.json
16:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47434 and previous config saved to /var/cache/conftool/dbconfig/20230503-163051-ladsgroup.json
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47433 and previous config saved to /var/cache/conftool/dbconfig/20230503-162908-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47432 and previous config saved to /var/cache/conftool/dbconfig/20230503-162741-ladsgroup.json
16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47431 and previous config saved to /var/cache/conftool/dbconfig/20230503-161545-ladsgroup.json
16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47430 and previous config saved to /var/cache/conftool/dbconfig/20230503-161402-ladsgroup.json
16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2011.mgmt.codfw.wmnet with reboot policy FORCED
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47429 and previous config saved to /var/cache/conftool/dbconfig/20230503-161235-ladsgroup.json
16:08 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2001.codfw.wmnet
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47428 and previous config saved to /var/cache/conftool/dbconfig/20230503-160601-ladsgroup.json
16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47427 and previous config saved to /var/cache/conftool/dbconfig/20230503-160146-ladsgroup.json
16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47426 and previous config saved to /var/cache/conftool/dbconfig/20230503-160039-ladsgroup.json
15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47425 and previous config saved to /var/cache/conftool/dbconfig/20230503-155946-ladsgroup.json
15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47424 and previous config saved to /var/cache/conftool/dbconfig/20230503-155506-ladsgroup.json
15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47423 and previous config saved to /var/cache/conftool/dbconfig/20230503-155221-ladsgroup.json
15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47422 and previous config saved to /var/cache/conftool/dbconfig/20230503-154639-ladsgroup.json
15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47421 and previous config saved to /var/cache/conftool/dbconfig/20230503-154000-ladsgroup.json
15:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
15:37 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1002.eqiad.wmnet
15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47420 and previous config saved to /var/cache/conftool/dbconfig/20230503-153715-ladsgroup.json
15:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
15:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47419 and previous config saved to /var/cache/conftool/dbconfig/20230503-153133-ladsgroup.json
15:29 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47418 and previous config saved to /var/cache/conftool/dbconfig/20230503-152453-ladsgroup.json
15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47417 and previous config saved to /var/cache/conftool/dbconfig/20230503-152208-ladsgroup.json
15:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47416 and previous config saved to /var/cache/conftool/dbconfig/20230503-151627-ladsgroup.json
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47415 and previous config saved to /var/cache/conftool/dbconfig/20230503-151013-ladsgroup.json
15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47414 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47413 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47412 and previous config saved to /var/cache/conftool/dbconfig/20230503-150702-ladsgroup.json
15:03 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2012.codfw.wmnet: Upgrade Cassandra â T335383 - eevans@cumin1001
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47411 and previous config saved to /var/cache/conftool/dbconfig/20230503-150103-ladsgroup.json
15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47410 and previous config saved to /var/cache/conftool/dbconfig/20230503-150042-ladsgroup.json
15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
15:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47409 and previous config saved to /var/cache/conftool/dbconfig/20230503-150017-ladsgroup.json
14:59 sukhe: fix backup route for high-traffic2 in codfw: set routing-options static route 208.80.153.240/28 next-hop 10.192.17.7
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47408 and previous config saved to /var/cache/conftool/dbconfig/20230503-145440-ladsgroup.json
14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2007.codfw.wmnet
14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47407 and previous config saved to /var/cache/conftool/dbconfig/20230503-144511-ladsgroup.json
14:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
14:43 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733
14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt frav1003 - jclark@cumin1001"
14:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47406 and previous config saved to /var/cache/conftool/dbconfig/20230503-143933-ladsgroup.json
14:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2007.codfw.wmnet
14:33 sukhe: set routing-options static route 208.80.153.224/28 next-hop 10.192.49.7 [move static route for high-traffic1 to lvs2010]: T335777
14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47405 and previous config saved to /var/cache/conftool/dbconfig/20230503-143005-ladsgroup.json
14:26 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka main clusters - T334733
14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47404 and previous config saved to /var/cache/conftool/dbconfig/20230503-142427-ladsgroup.json
14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47403 and previous config saved to /var/cache/conftool/dbconfig/20230503-141817-ladsgroup.json
14:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47402 and previous config saved to /var/cache/conftool/dbconfig/20230503-141752-ladsgroup.json
14:16 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47401 and previous config saved to /var/cache/conftool/dbconfig/20230503-141458-ladsgroup.json
14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47399 and previous config saved to /var/cache/conftool/dbconfig/20230503-140908-ladsgroup.json
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47398 and previous config saved to /var/cache/conftool/dbconfig/20230503-140540-ladsgroup.json
14:03 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kafkamon2003.codfw.wmnet
14:03 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47396 and previous config saved to /var/cache/conftool/dbconfig/20230503-140246-ladsgroup.json
14:02 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47395 and previous config saved to /var/cache/conftool/dbconfig/20230503-135402-ladsgroup.json
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47394 and previous config saved to /var/cache/conftool/dbconfig/20230503-135034-ladsgroup.json
13:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafkamon2003.codfw.wmnet on all recursors
13:47 herron@cumin1001: START - Cookbook sre.dns.wipe-cache kafkamon2003.codfw.wmnet on all recursors
13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:47 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47393 and previous config saved to /var/cache/conftool/dbconfig/20230503-134740-ladsgroup.json
13:46 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
13:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
13:40 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47392 and previous config saved to /var/cache/conftool/dbconfig/20230503-133855-ladsgroup.json
13:36 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
13:36 slyngshede@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM idm-test1001.wikimedia.org
13:35 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47391 and previous config saved to /var/cache/conftool/dbconfig/20230503-133528-ladsgroup.json
13:34 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:34 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM idm-test1001.wikimedia.org
13:33 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47390 and previous config saved to /var/cache/conftool/dbconfig/20230503-133232-ladsgroup.json
13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47389 and previous config saved to /var/cache/conftool/dbconfig/20230503-133117-ladsgroup.json
13:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47388 and previous config saved to /var/cache/conftool/dbconfig/20230503-132349-ladsgroup.json
13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47387 and previous config saved to /var/cache/conftool/dbconfig/20230503-132022-ladsgroup.json
13:19 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47386 and previous config saved to /var/cache/conftool/dbconfig/20230503-131736-ladsgroup.json
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47385 and previous config saved to /var/cache/conftool/dbconfig/20230503-131656-ladsgroup.json
13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47384 and previous config saved to /var/cache/conftool/dbconfig/20230503-131611-ladsgroup.json
13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47383 and previous config saved to /var/cache/conftool/dbconfig/20230503-131414-ladsgroup.json
13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47382 and previous config saved to /var/cache/conftool/dbconfig/20230503-131249-ladsgroup.json
13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47381 and previous config saved to /var/cache/conftool/dbconfig/20230503-131224-ladsgroup.json
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47380 and previous config saved to /var/cache/conftool/dbconfig/20230503-130149-ladsgroup.json
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47379 and previous config saved to /var/cache/conftool/dbconfig/20230503-130105-ladsgroup.json
12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47378 and previous config saved to /var/cache/conftool/dbconfig/20230503-125718-ladsgroup.json
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47377 and previous config saved to /var/cache/conftool/dbconfig/20230503-124643-ladsgroup.json
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47376 and previous config saved to /var/cache/conftool/dbconfig/20230503-124558-ladsgroup.json
12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47375 and previous config saved to /var/cache/conftool/dbconfig/20230503-124212-ladsgroup.json
12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47374 and previous config saved to /var/cache/conftool/dbconfig/20230503-123837-ladsgroup.json
12:37 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47373 and previous config saved to /var/cache/conftool/dbconfig/20230503-123714-ladsgroup.json
12:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47372 and previous config saved to /var/cache/conftool/dbconfig/20230503-123649-ladsgroup.json
12:36 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
12:31 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47371 and previous config saved to /var/cache/conftool/dbconfig/20230503-123137-ladsgroup.json
12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47370 and previous config saved to /var/cache/conftool/dbconfig/20230503-122705-ladsgroup.json
12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47369 and previous config saved to /var/cache/conftool/dbconfig/20230503-122143-ladsgroup.json
12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47368 and previous config saved to /var/cache/conftool/dbconfig/20230503-122113-ladsgroup.json
12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47367 and previous config saved to /var/cache/conftool/dbconfig/20230503-122049-ladsgroup.json
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47366 and previous config saved to /var/cache/conftool/dbconfig/20230503-122040-ladsgroup.json
12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47365 and previous config saved to /var/cache/conftool/dbconfig/20230503-122000-ladsgroup.json
12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
12:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
12:11 Amir1: Removing db1111 from zarcillo T335836
12:09 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
12:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47364 and previous config saved to /var/cache/conftool/dbconfig/20230503-120637-ladsgroup.json
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1111.eqiad.wmnet
12:06 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47363 and previous config saved to /var/cache/conftool/dbconfig/20230503-120536-ladsgroup.json
12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47362 and previous config saved to /var/cache/conftool/dbconfig/20230503-120453-ladsgroup.json
12:02 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1111.eqiad.wmnet
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47361 and previous config saved to /var/cache/conftool/dbconfig/20230503-115130-ladsgroup.json
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1110 from dbctl T335011', diff saved to https://phabricator.wikimedia.org/P47360 and previous config saved to /var/cache/conftool/dbconfig/20230503-115124-marostegui.json
11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47359 and previous config saved to /var/cache/conftool/dbconfig/20230503-115030-ladsgroup.json
11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47358 and previous config saved to /var/cache/conftool/dbconfig/20230503-114947-ladsgroup.json
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47357 and previous config saved to /var/cache/conftool/dbconfig/20230503-114426-ladsgroup.json
11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47356 and previous config saved to /var/cache/conftool/dbconfig/20230503-114335-ladsgroup.json
11:40 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2015.codfw.wmnet
11:38 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2015.codfw.wmnet
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47355 and previous config saved to /var/cache/conftool/dbconfig/20230503-113524-ladsgroup.json
11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47354 and previous config saved to /var/cache/conftool/dbconfig/20230503-113441-ladsgroup.json
11:31 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:28 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47353 and previous config saved to /var/cache/conftool/dbconfig/20230503-112828-ladsgroup.json
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47352 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47351 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1111 from dbctl T335836', diff saved to https://phabricator.wikimedia.org/P47350 and previous config saved to /var/cache/conftool/dbconfig/20230503-112812-ladsgroup.json
11:27 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:23 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47349 and previous config saved to /var/cache/conftool/dbconfig/20230503-112037-root.json
11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47348 and previous config saved to /var/cache/conftool/dbconfig/20230503-111910-ladsgroup.json
11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Decom db1111 T335836', diff saved to https://phabricator.wikimedia.org/P47347 and previous config saved to /var/cache/conftool/dbconfig/20230503-111904-ladsgroup.json
11:18 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
11:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
11:11 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47346 and previous config saved to /var/cache/conftool/dbconfig/20230503-111145-ladsgroup.json
11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
11:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47345 and previous config saved to /var/cache/conftool/dbconfig/20230503-110532-root.json
11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47344 and previous config saved to /var/cache/conftool/dbconfig/20230503-110357-ladsgroup.json
11:02 marostegui: Reboot dbproxy200[1-4]
11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47343 and previous config saved to /var/cache/conftool/dbconfig/20230503-105639-ladsgroup.json
10:50 claime: Migrating recommendation-api codfw to mw-api-int-async - T334062
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47341 and previous config saved to /var/cache/conftool/dbconfig/20230503-105004-ladsgroup.json
10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47340 and previous config saved to /var/cache/conftool/dbconfig/20230503-104939-ladsgroup.json
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47339 and previous config saved to /var/cache/conftool/dbconfig/20230503-104851-ladsgroup.json
10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2004.codfw.wmnet
10:41 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
10:40 claime: Migrating recommendation-api staging to mw-api-int-async - T334062
10:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
10:38 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1004.eqiad.wmnet
10:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47338 and previous config saved to /var/cache/conftool/dbconfig/20230503-103523-root.json
10:35 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47337 and previous config saved to /var/cache/conftool/dbconfig/20230503-103433-ladsgroup.json
10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestagemaster2001.codfw.wmnet
10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47336 and previous config saved to /var/cache/conftool/dbconfig/20230503-103345-ladsgroup.json
10:33 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
10:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47335 and previous config saved to /var/cache/conftool/dbconfig/20230503-102719-ladsgroup.json
10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47334 and previous config saved to /var/cache/conftool/dbconfig/20230503-102654-ladsgroup.json
10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47333 and previous config saved to /var/cache/conftool/dbconfig/20230503-102018-root.json
10:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47332 and previous config saved to /var/cache/conftool/dbconfig/20230503-101926-ladsgroup.json
10:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
10:18 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
10:18 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab2003.wikimedia.org
10:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
10:16 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
10:16 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47331 and previous config saved to /var/cache/conftool/dbconfig/20230503-101147-ladsgroup.json
10:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
10:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
10:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
10:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47330 and previous config saved to /var/cache/conftool/dbconfig/20230503-100513-root.json
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47329 and previous config saved to /var/cache/conftool/dbconfig/20230503-100420-ladsgroup.json
10:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
10:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
10:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47328 and previous config saved to /var/cache/conftool/dbconfig/20230503-095901-ladsgroup.json
09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47327 and previous config saved to /var/cache/conftool/dbconfig/20230503-095641-ladsgroup.json
09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47325 and previous config saved to /var/cache/conftool/dbconfig/20230503-095008-root.json
09:47 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
09:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
09:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
09:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47324 and previous config saved to /var/cache/conftool/dbconfig/20230503-094135-ladsgroup.json
09:36 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47323 and previous config saved to /var/cache/conftool/dbconfig/20230503-093606-ladsgroup.json
09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47322 and previous config saved to /var/cache/conftool/dbconfig/20230503-093503-root.json
09:29 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
09:29 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47321 and previous config saved to /var/cache/conftool/dbconfig/20230503-092856-root.json
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47320 and previous config saved to /var/cache/conftool/dbconfig/20230503-092847-root.json
09:28 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
09:26 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P47319 and previous config saved to /var/cache/conftool/dbconfig/20230503-092513-root.json
09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
09:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
09:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
09:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47316 and previous config saved to /var/cache/conftool/dbconfig/20230503-085847-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47315 and previous config saved to /var/cache/conftool/dbconfig/20230503-085837-root.json
08:44 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47314 and previous config saved to /var/cache/conftool/dbconfig/20230503-084342-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47313 and previous config saved to /var/cache/conftool/dbconfig/20230503-084332-root.json
08:39 marostegui: dbmaint deploy schema change on eqiad s3 with replication T335834
08:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 2%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47303 and previous config saved to /var/cache/conftool/dbconfig/20230503-072818-root.json
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 2%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47302 and previous config saved to /var/cache/conftool/dbconfig/20230503-072808-root.json
07:26 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
07:20 taavi@deploy1002: taavi and samwilson: Backport for Remove duplicated diff-mode selector in save dialog (T324759) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47299 and previous config saved to /var/cache/conftool/dbconfig/20230503-071313-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47298 and previous config saved to /var/cache/conftool/dbconfig/20230503-071303-root.json
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1213 (s5,s6) to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47297 and previous config saved to /var/cache/conftool/dbconfig/20230503-071046-marostegui.json
07:09 moritzm: installing glibc bugfix updates from bullseye point release
07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1117.eqiad.wmnet
07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:01 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1117.eqiad.wmnet
06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
06:46 marostegui: Disconnect codfw -> eqiad replication on s1 T335267
06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
06:28 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
06:14 marostegui: Disconnect codfw -> eqiad replication on s8 T335267
06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
06:09 marostegui: Disconnect codfw -> eqiad replication on s4 T335267
06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
06:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
06:06 marostegui: Disconnect codfw -> eqiad replication on s7 T335267
06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
06:01 marostegui: Disconnect codfw -> eqiad replication on s3 T335267
06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
05:59 marostegui: Disconnect codfw -> eqiad replication on s5 T335267
05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
05:57 marostegui: Disconnect codfw -> eqiad replication on s2 T335267
05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
05:54 marostegui: Disconnect codfw -> eqiad replication on s6 T335267
05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
05:51 marostegui: Disconnect codfw -> eqiad replication on es5 T335267
05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
05:48 marostegui: Disconnect codfw -> eqiad replication on es4 T335267
05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
05:44 marostegui: Disconnect codfw -> eqiad replication on x1 T335267
05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
05:40 marostegui: Disconnect codfw -> eqiad replication on pc3 T335267
05:40 marostegui: Disconnect codfw -> eqiad replication on pc2 T335267
05:40 marostegui: Disconnect codfw -> eqiad replication on pc1 T335267
05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
19:05 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
19:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
15:36 claime: Re-running puppet on failed parse servers - T313227
15:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
15:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
15:16 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: codfw row C switches upgrade - T334049
15:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:16 urbanecm@deploy1002: urbanecm and wmde-fisch: Backport for Enable Kartographer Nearby on mobile (T333137) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:59 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row C switches upgrade - T334049
09:53 ladsgroup@deploy1002: ladsgroup: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
09:21 eoghan@cumin1001: END (ERROR) - Cookbook sre.gitlab.failover (exit_code=97) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
09:13 ladsgroup@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
07:05 taavi@deploy1002: taavi and mdsshakil: Backport for Enable WikiLove extension on bnwikibooks (T335705) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:06 taavi@deploy1002: legoktm and taavi: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet