22:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:44 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
21:36 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:35 kindrobot: close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
21:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
21:25 kindrobot@deploy1002: kindrobot and nray: Backport for Enable ClientPreferences for group0 (T327979) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request T328456" --skip-subpages # T328456
15:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
15:32 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Set 'groupLoadsBySection' for s11 (T326980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
14:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
08:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:45 elukey: restore previously removed password for keystore to kafka-logging clusters
08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
07:56 moritzm: installing bash bugfix updates from Bullseye point release
07:22 marostegui: dbmaint Schema change on s3 eqiad T328373
07:22 marostegui: dbmaint Schema change on s1 eqiad T328373
07:10 marostegui: Failover m2 from db1164 to db1195 - T328253
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:03 marostegui: dbmaint Schema change on s5 eqiad T328373
06:59 marostegui: dbmaint Schema change on s7 eqiad T328373
06:57 marostegui: dbmaint Schema change on s4 eqiad T328373
06:52 marostegui: dbmaint Schema change on s8 eqiad T328373
21:35 urbanecm@deploy1002: tgr and urbanecm: Backport for GrowthExperiments: Update campaign configuration (T321370) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
18:34 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS bullseye
18:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:29 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
18:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3052.esams.wmnet
18:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
18:04 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43517 and previous config saved to /var/cache/conftool/dbconfig/20230130-174957-ladsgroup.json
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43516 and previous config saved to /var/cache/conftool/dbconfig/20230130-173450-ladsgroup.json
17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
17:24 inflatador: bking@build2001 rebuilding docker images for 884351 complete
17:22 inflatador: bking@build2001 rebuilding docker images for 884351
17:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS bullseye
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43515 and previous config saved to /var/cache/conftool/dbconfig/20230130-171944-ladsgroup.json
17:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS bullseye
17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43514 and previous config saved to /var/cache/conftool/dbconfig/20230130-170437-ladsgroup.json
16:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43513 and previous config saved to /var/cache/conftool/dbconfig/20230130-165359-ladsgroup.json
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43512 and previous config saved to /var/cache/conftool/dbconfig/20230130-165348-ladsgroup.json
16:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43511 and previous config saved to /var/cache/conftool/dbconfig/20230130-163842-ladsgroup.json
16:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
16:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43510 and previous config saved to /var/cache/conftool/dbconfig/20230130-162336-ladsgroup.json
16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43508 and previous config saved to /var/cache/conftool/dbconfig/20230130-160829-ladsgroup.json
16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
16:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS bullseye
16:03 sukhe: racreset cp3050.esams.wmnet: firmware cookbook iDRAC upgrade test
16:03 moritzm: upgrading idp-test to latest Java security update
15:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43507 and previous config saved to /var/cache/conftool/dbconfig/20230130-155819-root.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43506 and previous config saved to /var/cache/conftool/dbconfig/20230130-155802-ladsgroup.json
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43505 and previous config saved to /var/cache/conftool/dbconfig/20230130-155747-ladsgroup.json
15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43504 and previous config saved to /var/cache/conftool/dbconfig/20230130-154314-root.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43503 and previous config saved to /var/cache/conftool/dbconfig/20230130-154241-ladsgroup.json
15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS bullseye
15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS bullseye
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43502 and previous config saved to /var/cache/conftool/dbconfig/20230130-152809-root.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43501 and previous config saved to /var/cache/conftool/dbconfig/20230130-152734-ladsgroup.json
15:14 marostegui: Retrospective: Starting s4 codfw failover from db2110 to db2140 - T328022
15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43500 and previous config saved to /var/cache/conftool/dbconfig/20230130-151304-root.json
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43499 and previous config saved to /var/cache/conftool/dbconfig/20230130-151228-ladsgroup.json
15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43498 and previous config saved to /var/cache/conftool/dbconfig/20230130-150132-ladsgroup.json
15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43497 and previous config saved to /var/cache/conftool/dbconfig/20230130-145759-root.json
14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T328022', diff saved to https://phabricator.wikimedia.org/P43496 and previous config saved to /var/cache/conftool/dbconfig/20230130-145508-root.json
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary T328022', diff saved to https://phabricator.wikimedia.org/P43495 and previous config saved to /var/cache/conftool/dbconfig/20230130-145421-root.json
14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43494 and previous config saved to /var/cache/conftool/dbconfig/20230130-145229-ladsgroup.json
14:47 moritzm: updating puppetdb 7 hosts to 7.12.1 T321783
14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS bullseye
14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43493 and previous config saved to /var/cache/conftool/dbconfig/20230130-144213-ladsgroup.json
14:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43492 and previous config saved to /var/cache/conftool/dbconfig/20230130-143723-ladsgroup.json
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43490 and previous config saved to /var/cache/conftool/dbconfig/20230130-142216-ladsgroup.json
14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T328022', diff saved to https://phabricator.wikimedia.org/P43489 and previous config saved to /var/cache/conftool/dbconfig/20230130-141822-root.json
14:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43488 and previous config saved to /var/cache/conftool/dbconfig/20230130-141203-ladsgroup.json
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43487 and previous config saved to /var/cache/conftool/dbconfig/20230130-140710-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43486 and previous config saved to /var/cache/conftool/dbconfig/20230130-135659-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43485 and previous config saved to /var/cache/conftool/dbconfig/20230130-135632-ladsgroup.json
13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43484 and previous config saved to /var/cache/conftool/dbconfig/20230130-134406-ladsgroup.json
13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 23s)
13:29 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
13:29 godog: bounce logstash on logstash1025 -- GC unhappy causing kafka lag
13:29 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 01m 13s)
13:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43483 and previous config saved to /var/cache/conftool/dbconfig/20230130-132701-ladsgroup.json
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43482 and previous config saved to /var/cache/conftool/dbconfig/20230130-131155-ladsgroup.json
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3004.wikimedia.org
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43481 and previous config saved to /var/cache/conftool/dbconfig/20230130-125648-ladsgroup.json
12:46 awight@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian (duration: 01m 27s)
12:45 awight@deploy1002: Started deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43479 and previous config saved to /var/cache/conftool/dbconfig/20230130-124142-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43478 and previous config saved to /var/cache/conftool/dbconfig/20230130-123004-ladsgroup.json
12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43477 and previous config saved to /var/cache/conftool/dbconfig/20230130-122943-ladsgroup.json
12:25 awight@deploy1002: Finished deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad (duration: 02m 44s)
12:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:23 awight@deploy1002: Started deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43476 and previous config saved to /var/cache/conftool/dbconfig/20230130-121437-ladsgroup.json
12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3004.wikimedia.org
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43475 and previous config saved to /var/cache/conftool/dbconfig/20230130-115930-ladsgroup.json
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6001.wikimedia.org
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6001.wikimedia.org
11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42473
11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42473
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43474 and previous config saved to /var/cache/conftool/dbconfig/20230130-114424-ladsgroup.json
11:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
11:41 Amir1: dropping old wikiadmin user (T326802)
11:35 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
11:35 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43473 and previous config saved to /var/cache/conftool/dbconfig/20230130-113319-ladsgroup.json
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43472 and previous config saved to /var/cache/conftool/dbconfig/20230130-113254-ladsgroup.json
11:28 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
11:24 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
11:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4002.wikimedia.org
11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43471 and previous config saved to /var/cache/conftool/dbconfig/20230130-111748-ladsgroup.json
11:17 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
11:11 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
11:09 phedenskog@deploy1002: Started deploy [performance/navtiming@4e5ff3f]: (no justification provided)
11:05 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43470 and previous config saved to /var/cache/conftool/dbconfig/20230130-110241-ladsgroup.json
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43468 and previous config saved to /var/cache/conftool/dbconfig/20230130-104735-ladsgroup.json
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4003.wikimedia.org
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43467 and previous config saved to /var/cache/conftool/dbconfig/20230130-103540-ladsgroup.json
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:30 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43466 and previous config saved to /var/cache/conftool/dbconfig/20230130-102500-ladsgroup.json
10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4003.wikimedia.org
10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
10:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43465 and previous config saved to /var/cache/conftool/dbconfig/20230130-100954-ladsgroup.json
10:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43464 and previous config saved to /var/cache/conftool/dbconfig/20230130-095447-ladsgroup.json
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43463 and previous config saved to /var/cache/conftool/dbconfig/20230130-093941-ladsgroup.json
09:29 jynus: disabling puppet on dbprov2004 to reorganize partitions T327155
09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43462 and previous config saved to /var/cache/conftool/dbconfig/20230130-092804-ladsgroup.json
09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43461 and previous config saved to /var/cache/conftool/dbconfig/20230130-092732-ladsgroup.json
09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43460 and previous config saved to /var/cache/conftool/dbconfig/20230130-091225-ladsgroup.json
08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43459 and previous config saved to /var/cache/conftool/dbconfig/20230130-085719-ladsgroup.json
08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43458 and previous config saved to /var/cache/conftool/dbconfig/20230130-085530-ladsgroup.json
08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43457 and previous config saved to /var/cache/conftool/dbconfig/20230130-084213-ladsgroup.json
08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43456 and previous config saved to /var/cache/conftool/dbconfig/20230130-084024-ladsgroup.json
08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43455 and previous config saved to /var/cache/conftool/dbconfig/20230130-083034-ladsgroup.json
08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43454 and previous config saved to /var/cache/conftool/dbconfig/20230130-082517-ladsgroup.json
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43452 and previous config saved to /var/cache/conftool/dbconfig/20230130-081011-ladsgroup.json
07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43451 and previous config saved to /var/cache/conftool/dbconfig/20230130-074502-ladsgroup.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43450 and previous config saved to /var/cache/conftool/dbconfig/20230130-073827-ladsgroup.json
07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43449 and previous config saved to /var/cache/conftool/dbconfig/20230130-073806-ladsgroup.json
07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43448 and previous config saved to /var/cache/conftool/dbconfig/20230130-072956-ladsgroup.json
07:26 marostegui: dbmaint Schema change on s7 eqiad T328236
07:25 marostegui: dbmaint Schema change on s2 eqiad T328236
07:25 marostegui: dbmaint Schema change on s1 eqiad T328236
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43447 and previous config saved to /var/cache/conftool/dbconfig/20230130-072300-ladsgroup.json
07:21 marostegui: dbmaint Schema change on s1 eqiad T328236
07:17 marostegui: dbmaint Schema change on s4 eqiad T328236
07:16 marostegui: dbmaint Schema change on s6 eqiad T328236
07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43446 and previous config saved to /var/cache/conftool/dbconfig/20230130-071450-ladsgroup.json
07:11 marostegui: dbmaint Schema change on s5 eqiad T328236
07:10 marostegui: dbmaint Schema change on s8 eqiad T328236
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43445 and previous config saved to /var/cache/conftool/dbconfig/20230130-070753-ladsgroup.json
07:05 marostegui: dbmaint Schema change on s3 eqiad T328086
07:02 marostegui: dbmaint Schema change on s1 eqiad T328086
07:01 marostegui: dbmaint Schema change on s4 eqiad T328086
06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43444 and previous config saved to /var/cache/conftool/dbconfig/20230130-065943-ladsgroup.json
06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43443 and previous config saved to /var/cache/conftool/dbconfig/20230130-065247-ladsgroup.json
06:51 marostegui: dbmaint Schema change on s5 eqiad T328086
06:45 marostegui: dbmaint Schema change on s2 eqiad T328086
06:43 marostegui: dbmaint Schema change on s7 eqiad T328086
06:41 marostegui: dbmaint Schema change on s8 eqiad T328086
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:34 marostegui: dbmaint Schema change on s6 eqiad T328086
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T318605)', diff saved to https://phabricator.wikimedia.org/P43441 and previous config saved to /var/cache/conftool/dbconfig/20230130-061534-ladsgroup.json
06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43440 and previous config saved to /var/cache/conftool/dbconfig/20230130-061401-ladsgroup.json
06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43439 and previous config saved to /var/cache/conftool/dbconfig/20230130-053033-ladsgroup.json
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
2023-01-29
14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:27 zabe@deploy1002: zabe: Backport for Stop setting cul_actor migration var (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
23:26 zabe@deploy1002: zabe and superpes: Backport for Add a project logo on gorwiktionary (T327987) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T328024
16:09 moritzm: installing distro-info-data updates from Bullseye point release
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master T328023', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T328023', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:25 moritzm: restarting turnilo for nodejs security update
13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Change time zone setting on gorwiktionary (T327986) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: T323717
12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
07:25 dcausse: T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
07:23 marostegui: Failover m1 from db1195 to db1176 - T327800
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:16 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Depool pc2011 (T327925) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T327861
14:57 urbanecm@deploy1002: urbanecm and migr: Backport for Enable the Wikibase REST API on Wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 moritzm: installing libtasn security updates on buster
08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
22:19 mutante: DNS - adding new project language "gur" (GurenÉ) - GurenÉ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. T327813
22:13 samtar@deploy1002: samtar and stang: Backport for newiki: Add new permissions to group reviewer (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:11 samtar@deploy1002: daniel and samtar: Backport for Increase PC writes from parsoid API to 10% (T320534) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:48 XioNoX: restart ulsfo switches for network maintenance
12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:38 zabe@deploy1002: zabe: Backport for Remove PoolCounter from extension-list (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:49 XioNoX: depool ulsfo for network maintenance - T316532
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 T326116', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 T326116', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
10:33 vgutierrez: repool cp4046
10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:31 vgutierrez: restarting varnish on cp4046
10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:29 vgutierrez: depool cp4046
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
08:52 kartik@deploy1002: awight and kartik: Backport for Deprecate the EnableMapFrame feature flag (T326288) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T327745', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw T327745', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - T327745
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T327745', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API T327739', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 T327739', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T327739', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
00:03 zabe@deploy1002: zabe: Backport for Use core's PoolCounterClient (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for Enable Page Tools for logged-in users on enwiki (T327686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
15:44 papaul: on going maintenance on fasw-codfw
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
14:27 taavi@deploy1002: stang and taavi: Backport for zhwiki: Install PageAssessments (T326387) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
10:01 ladsgroup@deploy1002: ladsgroup: Backport for Remove Flow as default in techconductwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for Tweaks for new heading HTML structure (T327328 T327469) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Enable Page tools on viwiki and itwiki (T327348) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Fix grid blowout with limited width turned off (T327423) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
16:27 moritzm: installing cryptsetup updates for bullseye
16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
08:26 moritzm: installing sudo security updates
07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - T327372
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
2023-01-18
23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # T327290
21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
21:03 kindrobot: start UTC late backport window
20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Enable the REST API on test-wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [100%] English Wikipedia uses Vector 2022 skin synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Revert "Breaking upgrade: mapdata" (T327151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
00:20 zabe@deploy1002: zabe and zabe: Backport for Add script to rename a change tag in wmf prod (T327118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for Make sticky header edit button default for all wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for Enable Phonos on afwiktionary and arwiki (T324561) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
19:50 ryankemper: T327175 Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
18:41 zabe@deploy1002: zabe and zabe: Backport for Revert "Enable visual enhancements on all talk namespaces" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
17:01 effie: restarting confd on deploy1002
16:59 effie: pooling back depooled mw servers in codfw
16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: T325557
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- T327001
14:52 urandom: disabling Cassandra hinted-handoff for codfw -- T327001
11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T326134', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T326134', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T326134', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T326134
06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T326134', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
10:48 moritzm: installing libtasn1-6 security updates on Bullseye
10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
08:14 oblivian@deploy1002: Synchronized README: test null deployment for T327041 (duration: 07m 12s)
08:09 Emperor: stopped swift_rclone_sync on ms-be1069
20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
21:41 thcipriani@deploy1002: thcipriani and stang: Backport for nlwiki: Add block right to checkuser group (T326355) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:19 thcipriani@deploy1002: thcipriani and stang: Backport for etwikiquote: Switch logo variant back (T313698) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18 refs T325581
18:36 mutante: stat1008 - systemctl reset-failed - clears Icinga alerts from failed things of the past
18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
17:45 mutante: powercycling mc2040 via mgmt ocnsole
15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote T321288
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
14:37 moritzm: installing sqlite3 security updates on buster
14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
14:20 taavi@deploy1002: taavi and matmarex: Backport for Track callers of parseRevisionParsoidHtml. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary T326138
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:24 XioNoX: rollback redirect ns2 to authdns1001 - T316532
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
10:01 XioNoX: reboot asw2-esams for upgrade - T316532
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
09:47 btullis@cumin1001: Added views for new wiki: pcmwiki T310879
09:46 XioNoX: redirect ns2 to authdns1001 - T316532
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json
2023-01-11
23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18 refs T325581
23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
21:06 kindrobot: start UTC late backport window
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad T321391
18:52 brett: Removing legacy vips from dns servers - T239993
18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad T321391
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
18:22 btullis@cumin1001: Added views for new wiki: blkwiki T310872
18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad T321391
16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad T321391
16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad T321391
16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad T321391
16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad T321391
16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
15:36 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Fix test constructing HTMLFormField without parent (T326621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
12:42 moritzm: installing postgresql 11 security updates on maps/codfw
12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
23:22 zabe@deploy1002: zabe and zabe: Backport for Start writing to rev_comment_id on test wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42964 and previous config saved to /var/cache/conftool/dbconfig/20230110-193245-ladsgroup.json
19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
17:03 ayounsi@deploy1002: deploy aborted: netbox-next to 3.2.9 (duration: 00m 07s)
17:03 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42952 and previous config saved to /var/cache/conftool/dbconfig/20230110-165952-ladsgroup.json
16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After the incident', diff saved to https://phabricator.wikimedia.org/P42951 and previous config saved to /var/cache/conftool/dbconfig/20230110-165406-root.json
16:48 bblack: depooling eqsin from DNS
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42950 and previous config saved to /var/cache/conftool/dbconfig/20230110-164447-ladsgroup.json
16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After the incident', diff saved to https://phabricator.wikimedia.org/P42949 and previous config saved to /var/cache/conftool/dbconfig/20230110-163901-root.json
16:36 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:24 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After the incident', diff saved to https://phabricator.wikimedia.org/P42948 and previous config saved to /var/cache/conftool/dbconfig/20230110-162356-root.json
16:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:14 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2002.codfw.wmnet with OS bullseye
16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After the incident', diff saved to https://phabricator.wikimedia.org/P42947 and previous config saved to /var/cache/conftool/dbconfig/20230110-160851-root.json
16:08 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
16:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
15:59 SandraEbele: reran failed pageview-druid-hourly-coord oozie job for 2023-1-10-10.
15:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1373,1384-1385,1387].eqiad.wmnet
15:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1373,1384-1385,1387].eqiad.wmnet
15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After the incident', diff saved to https://phabricator.wikimedia.org/P42946 and previous config saved to /var/cache/conftool/dbconfig/20230110-155346-root.json
15:52 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2002.codfw.wmnet with OS bullseye
15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After the incident', diff saved to https://phabricator.wikimedia.org/P42945 and previous config saved to /var/cache/conftool/dbconfig/20230110-153841-root.json
15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After the incident', diff saved to https://phabricator.wikimedia.org/P42944 and previous config saved to /var/cache/conftool/dbconfig/20230110-152336-root.json
15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
15:17 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
15:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
15:02 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2037.codfw.wmnet
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:56 XioNoX: start VC link maintenance in eqiad - T325803
14:55 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
14:49 zabe: UTC afternoon deploys done
14:49 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
14:48 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:36 zabe: run populateCulActor on group0 wikis # T325484
14:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
14:35 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2037.codfw.wmnet
14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host apifeatureusage2001.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2036.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:28 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:06 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
14:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2035.codfw.wmnet
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:49 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
13:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb-test2001.codfw.wmnet
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
12:59 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1002.eqiad.wmnet with OS bullseye
12:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
12:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2034.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
12:12 claime: Finished rolling reboot of eqiad jobrunners
12:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
10:13 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
10:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2033.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
10:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
09:22 taavi: added zabe to wmf-deployment gerrit group T326327
09:19 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2033.codfw.wmnet
09:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2032.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
08:58 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
08:56 godog: upgrade thanos to 0.30.1 on thanos-fe1001 - T303154
08:54 godog: upgrade thanos to 0.30.1 on prometheus2006 - T303154
07:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2032.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2031.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:33 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2044.codfw.wmnet
07:28 XioNoX: depool ulsfo for network maintenance - T316532
07:22 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2031.codfw.wmnet
07:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T326133', diff saved to https://phabricator.wikimedia.org/P42940 and previous config saved to /var/cache/conftool/dbconfig/20230110-070223-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T326133', diff saved to https://phabricator.wikimedia.org/P42939 and previous config saved to /var/cache/conftool/dbconfig/20230110-070152-ladsgroup.json
07:01 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T326133
06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T326133', diff saved to https://phabricator.wikimedia.org/P42938 and previous config saved to /var/cache/conftool/dbconfig/20230110-062309-ladsgroup.json
06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
00:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
2023-01-09
22:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
22:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:32 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2030.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2030.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2029.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
21:38 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2029.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2027.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:34 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
21:29 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:21 kindrobot: starting UTC late backport window
21:21 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2027.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2026.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P42936 and previous config saved to /var/cache/conftool/dbconfig/20230109-210940-marostegui.json
21:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
19:37 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:33 bblack: cp5032: set param transit_buffer=4M via varnishadm
19:26 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2025.codfw.wmnet
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2025.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2024.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2024.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2023.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:41 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
18:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
18:30 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2023.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2022.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:56 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2022.codfw.wmnet
17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
17:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2021.codfw.wmnet
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:48 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
16:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2020.codfw.wmnet
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2020.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2019.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:08 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:04 XioNoX: start VC link maintenance in eqiad - T325803
15:29 claime: Not starting codfw jobrunner rolling reboot, deploy in progress
15:28 claime: Starting codfw jobrunner rolling reboot
15:26 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kartik: Backport for CX: Allow composer/installers plugin synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:11 effie: disable puppet on all 'P:mediawiki::mcrouter_wancache' hosts to merge 875894
15:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
15:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
15:02 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for extwiki: Install SandboxLink extension (T326450) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
14:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
14:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for jawikisource: Update project logo and wordmark (T326488) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for arwiki: Create extendedmover group (T326434) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:35 dcausse: restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
08:58 moritzm: installing glibc security updates
08:56 XioNoX: depool ulsfo for network maintenance - T316532
08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700
08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 327700
08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237
08:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 48237
08:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035
08:21 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org
08:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32035
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
08:08 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
18:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:13 Krinkle: krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431
17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye
17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
17:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
16:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
16:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42928 and previous config saved to /var/cache/conftool/dbconfig/20230106-004102-ladsgroup.json
00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42927 and previous config saved to /var/cache/conftool/dbconfig/20230106-002556-ladsgroup.json
00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42926 and previous config saved to /var/cache/conftool/dbconfig/20230106-001049-ladsgroup.json
2023-01-05
23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42925 and previous config saved to /var/cache/conftool/dbconfig/20230105-235543-ladsgroup.json
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42924 and previous config saved to /var/cache/conftool/dbconfig/20230105-235325-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42923 and previous config saved to /var/cache/conftool/dbconfig/20230105-235304-ladsgroup.json
23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42922 and previous config saved to /var/cache/conftool/dbconfig/20230105-233758-ladsgroup.json
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42921 and previous config saved to /var/cache/conftool/dbconfig/20230105-232251-ladsgroup.json
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42920 and previous config saved to /var/cache/conftool/dbconfig/20230105-230745-ladsgroup.json
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42919 and previous config saved to /var/cache/conftool/dbconfig/20230105-230629-ladsgroup.json
23:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42918 and previous config saved to /var/cache/conftool/dbconfig/20230105-230607-ladsgroup.json
22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42917 and previous config saved to /var/cache/conftool/dbconfig/20230105-225101-ladsgroup.json
22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42916 and previous config saved to /var/cache/conftool/dbconfig/20230105-223554-ladsgroup.json
22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42915 and previous config saved to /var/cache/conftool/dbconfig/20230105-222048-ladsgroup.json
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42914 and previous config saved to /var/cache/conftool/dbconfig/20230105-221932-ladsgroup.json
22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42913 and previous config saved to /var/cache/conftool/dbconfig/20230105-221911-ladsgroup.json
22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42912 and previous config saved to /var/cache/conftool/dbconfig/20230105-220404-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42911 and previous config saved to /var/cache/conftool/dbconfig/20230105-214858-ladsgroup.json
21:43 TheresNoTime: closing UTC late backport window
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42910 and previous config saved to /var/cache/conftool/dbconfig/20230105-213351-ladsgroup.json
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42909 and previous config saved to /var/cache/conftool/dbconfig/20230105-213235-ladsgroup.json
21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42908 and previous config saved to /var/cache/conftool/dbconfig/20230105-213214-ladsgroup.json
21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42907 and previous config saved to /var/cache/conftool/dbconfig/20230105-211707-ladsgroup.json
21:08 samtar@deploy1002: samtar and zabe: Backport for Start writing to cuc_comment_id everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42906 and previous config saved to /var/cache/conftool/dbconfig/20230105-210201-ladsgroup.json
20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42905 and previous config saved to /var/cache/conftool/dbconfig/20230105-204654-ladsgroup.json
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42904 and previous config saved to /var/cache/conftool/dbconfig/20230105-204438-ladsgroup.json
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42903 and previous config saved to /var/cache/conftool/dbconfig/20230105-204403-ladsgroup.json
20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42902 and previous config saved to /var/cache/conftool/dbconfig/20230105-202856-ladsgroup.json
20:17 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42901 and previous config saved to /var/cache/conftool/dbconfig/20230105-201350-ladsgroup.json
19:59 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.17 refs T325580
19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42900 and previous config saved to /var/cache/conftool/dbconfig/20230105-195843-ladsgroup.json
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42899 and previous config saved to /var/cache/conftool/dbconfig/20230105-195627-ladsgroup.json
19:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42898 and previous config saved to /var/cache/conftool/dbconfig/20230105-195606-ladsgroup.json
19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42897 and previous config saved to /var/cache/conftool/dbconfig/20230105-194059-ladsgroup.json
19:38 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.10-1wm3_amd64.changes: T325797
19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42896 and previous config saved to /var/cache/conftool/dbconfig/20230105-192553-ladsgroup.json
19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42895 and previous config saved to /var/cache/conftool/dbconfig/20230105-191046-ladsgroup.json
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42894 and previous config saved to /var/cache/conftool/dbconfig/20230105-190830-ladsgroup.json
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42893 and previous config saved to /var/cache/conftool/dbconfig/20230105-190724-ladsgroup.json
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42892 and previous config saved to /var/cache/conftool/dbconfig/20230105-185217-ladsgroup.json
18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42891 and previous config saved to /var/cache/conftool/dbconfig/20230105-183711-ladsgroup.json
18:22 taavi: delete some nostalgiawiki pages using maintenance/deleteBatch.php for T326334
18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42890 and previous config saved to /var/cache/conftool/dbconfig/20230105-182204-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42889 and previous config saved to /var/cache/conftool/dbconfig/20230105-181949-ladsgroup.json
18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42888 and previous config saved to /var/cache/conftool/dbconfig/20230105-181928-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42887 and previous config saved to /var/cache/conftool/dbconfig/20230105-180421-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42886 and previous config saved to /var/cache/conftool/dbconfig/20230105-174915-ladsgroup.json
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42885 and previous config saved to /var/cache/conftool/dbconfig/20230105-173408-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42884 and previous config saved to /var/cache/conftool/dbconfig/20230105-173154-ladsgroup.json
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42883 and previous config saved to /var/cache/conftool/dbconfig/20230105-173133-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42882 and previous config saved to /var/cache/conftool/dbconfig/20230105-171626-ladsgroup.json
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42880 and previous config saved to /var/cache/conftool/dbconfig/20230105-170119-ladsgroup.json
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42878 and previous config saved to /var/cache/conftool/dbconfig/20230105-164612-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42877 and previous config saved to /var/cache/conftool/dbconfig/20230105-164358-ladsgroup.json
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42876 and previous config saved to /var/cache/conftool/dbconfig/20230105-164258-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42875 and previous config saved to /var/cache/conftool/dbconfig/20230105-162751-ladsgroup.json
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42874 and previous config saved to /var/cache/conftool/dbconfig/20230105-161245-ladsgroup.json
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42873 and previous config saved to /var/cache/conftool/dbconfig/20230105-155738-ladsgroup.json
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42872 and previous config saved to /var/cache/conftool/dbconfig/20230105-155524-ladsgroup.json
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42871 and previous config saved to /var/cache/conftool/dbconfig/20230105-155503-ladsgroup.json
15:52 matthiasmullie: UTC afternoon backports done
15:51 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 12m 21s)
15:41 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42870 and previous config saved to /var/cache/conftool/dbconfig/20230105-153956-ladsgroup.json
15:37 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 08m 04s)
15:31 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42869 and previous config saved to /var/cache/conftool/dbconfig/20230105-152447-ladsgroup.json
15:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42868 and previous config saved to /var/cache/conftool/dbconfig/20230105-150939-ladsgroup.json
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42867 and previous config saved to /var/cache/conftool/dbconfig/20230105-150825-ladsgroup.json
15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42866 and previous config saved to /var/cache/conftool/dbconfig/20230105-150804-ladsgroup.json
14:58 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
14:56 claime: hard resetting mw1486
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42865 and previous config saved to /var/cache/conftool/dbconfig/20230105-145257-ladsgroup.json
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42864 and previous config saved to /var/cache/conftool/dbconfig/20230105-143751-ladsgroup.json
14:23 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42862 and previous config saved to /var/cache/conftool/dbconfig/20230105-142244-ladsgroup.json
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42861 and previous config saved to /var/cache/conftool/dbconfig/20230105-142029-ladsgroup.json
14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42860 and previous config saved to /var/cache/conftool/dbconfig/20230105-142008-ladsgroup.json
14:11 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42859 and previous config saved to /var/cache/conftool/dbconfig/20230105-140501-ladsgroup.json
13:58 Amir1: start of externallinks migration in elwiki (and rest of large wikis in s3) (T326314)
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42858 and previous config saved to /var/cache/conftool/dbconfig/20230105-134955-ladsgroup.json
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42857 and previous config saved to /var/cache/conftool/dbconfig/20230105-133448-ladsgroup.json
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42856 and previous config saved to /var/cache/conftool/dbconfig/20230105-133234-ladsgroup.json
13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42855 and previous config saved to /var/cache/conftool/dbconfig/20230105-133211-ladsgroup.json
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42854 and previous config saved to /var/cache/conftool/dbconfig/20230105-131705-ladsgroup.json
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42853 and previous config saved to /var/cache/conftool/dbconfig/20230105-130158-ladsgroup.json
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42852 and previous config saved to /var/cache/conftool/dbconfig/20230105-124651-ladsgroup.json
07:58 moritzm: installing glibc security updates on bullseye
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db2151 in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42836 and previous config saved to /var/cache/conftool/dbconfig/20230105-075046-marostegui.json
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to clone db1176 T326211', diff saved to https://phabricator.wikimedia.org/P42833 and previous config saved to /var/cache/conftool/dbconfig/20230105-064153-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2151 for the first time in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42832 and previous config saved to /var/cache/conftool/dbconfig/20230105-063937-marostegui.json
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
22:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42831 and previous config saved to /var/cache/conftool/dbconfig/20230104-223545-marostegui.json
22:27 kindrobot: finished UTC late backport window
22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42828 and previous config saved to /var/cache/conftool/dbconfig/20230104-222038-marostegui.json
22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42827 and previous config saved to /var/cache/conftool/dbconfig/20230104-220532-marostegui.json
21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42826 and previous config saved to /var/cache/conftool/dbconfig/20230104-215025-marostegui.json
21:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42825 and previous config saved to /var/cache/conftool/dbconfig/20230104-214616-marostegui.json
21:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42824 and previous config saved to /var/cache/conftool/dbconfig/20230104-214555-marostegui.json
21:35 kindrobot@deploy1002: kindrobot and jhsoby: Backport for Add namespace to gorwiktionary (T326253) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42823 and previous config saved to /var/cache/conftool/dbconfig/20230104-213049-marostegui.json
21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42820 and previous config saved to /var/cache/conftool/dbconfig/20230104-211542-marostegui.json
21:05 kindrobot: starting UTC late backport window
21:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42819 and previous config saved to /var/cache/conftool/dbconfig/20230104-210036-marostegui.json
20:58 Amir1: running refreshGlobalimagelinks.php on all wikis (T322588)
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42818 and previous config saved to /var/cache/conftool/dbconfig/20230104-205628-marostegui.json
20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42817 and previous config saved to /var/cache/conftool/dbconfig/20230104-205607-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42816 and previous config saved to /var/cache/conftool/dbconfig/20230104-204100-marostegui.json
20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42815 and previous config saved to /var/cache/conftool/dbconfig/20230104-202554-marostegui.json
20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42814 and previous config saved to /var/cache/conftool/dbconfig/20230104-201047-marostegui.json
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42813 and previous config saved to /var/cache/conftool/dbconfig/20230104-200638-marostegui.json
20:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42812 and previous config saved to /var/cache/conftool/dbconfig/20230104-200617-marostegui.json
19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42811 and previous config saved to /var/cache/conftool/dbconfig/20230104-195110-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42810 and previous config saved to /var/cache/conftool/dbconfig/20230104-193604-marostegui.json
19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.17 refs T325580
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42809 and previous config saved to /var/cache/conftool/dbconfig/20230104-192057-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42808 and previous config saved to /var/cache/conftool/dbconfig/20230104-191648-marostegui.json
19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42807 and previous config saved to /var/cache/conftool/dbconfig/20230104-191627-marostegui.json
19:07 dancy@deploy1002: Installing scap version "4.32.0" for 560 hosts
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42806 and previous config saved to /var/cache/conftool/dbconfig/20230104-190121-marostegui.json
18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42805 and previous config saved to /var/cache/conftool/dbconfig/20230104-184614-marostegui.json
18:40 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided)
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42804 and previous config saved to /var/cache/conftool/dbconfig/20230104-183108-marostegui.json
18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42803 and previous config saved to /var/cache/conftool/dbconfig/20230104-182700-marostegui.json
18:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42802 and previous config saved to /var/cache/conftool/dbconfig/20230104-182425-marostegui.json
18:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules) (duration: 00m 54s)
18:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules)
18:09 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42801 and previous config saved to /var/cache/conftool/dbconfig/20230104-180918-marostegui.json
18:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42800 and previous config saved to /var/cache/conftool/dbconfig/20230104-175412-marostegui.json
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42799 and previous config saved to /var/cache/conftool/dbconfig/20230104-173905-marostegui.json
17:37 dancy@deploy1002: Installing scap version "4.31.1" for 560 hosts
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42798 and previous config saved to /var/cache/conftool/dbconfig/20230104-173455-marostegui.json
17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42797 and previous config saved to /var/cache/conftool/dbconfig/20230104-173434-marostegui.json
17:28 dancy@deploy1002: Started scap: testing
17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42796 and previous config saved to /var/cache/conftool/dbconfig/20230104-171928-marostegui.json
17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42795 and previous config saved to /var/cache/conftool/dbconfig/20230104-170421-marostegui.json
16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42794 and previous config saved to /var/cache/conftool/dbconfig/20230104-164915-marostegui.json
16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42793 and previous config saved to /var/cache/conftool/dbconfig/20230104-164504-marostegui.json
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:30 dancy@deploy1002: Installing scap version "4.31.0" for 560 hosts
16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42792 and previous config saved to /var/cache/conftool/dbconfig/20230104-162828-marostegui.json
16:29 dancy@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 13s)
16:27 dancy@deploy1002: Started scap: (no justification provided)
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42791 and previous config saved to /var/cache/conftool/dbconfig/20230104-161321-marostegui.json
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42790 and previous config saved to /var/cache/conftool/dbconfig/20230104-155815-marostegui.json
15:51 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42789 and previous config saved to /var/cache/conftool/dbconfig/20230104-154308-marostegui.json
15:34 moritzm: installing glibc security updates on bullseye
15:34 moritzm: installing glibc security updates
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42788 and previous config saved to /var/cache/conftool/dbconfig/20230104-153435-marostegui.json
15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42787 and previous config saved to /var/cache/conftool/dbconfig/20230104-153413-marostegui.json
15:32 claime: Restarting rolling reboot of api_appserver hosts in codfw
15:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Disable LoadMonitor in CLI (T322156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42786 and previous config saved to /var/cache/conftool/dbconfig/20230104-151907-marostegui.json
15:06 marostegui: dbmaint deploy schema change on s5 eqiad T326224
15:05 marostegui: dbmaint deploy schema change on s3 eqiad T326224
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42785 and previous config saved to /var/cache/conftool/dbconfig/20230104-150400-marostegui.json
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42784 and previous config saved to /var/cache/conftool/dbconfig/20230104-144853-marostegui.json
14:46 marostegui: dbmaint deploy schema change on s3 eqiad T326222
14:44 marostegui: dbmaint deploy schema change on s5 eqiad T326222
14:42 XioNoX: fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838
14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42783 and previous config saved to /var/cache/conftool/dbconfig/20230104-144025-marostegui.json
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 urbanecm: UTC afternoon B&C window done
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42782 and previous config saved to /var/cache/conftool/dbconfig/20230104-143949-marostegui.json
14:38 marostegui: dbmaint deploy schema change on s3 eqiad T326223
14:27 XioNoX: fix inconsistent mtu on mr1-codfw - T315838
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42781 and previous config saved to /var/cache/conftool/dbconfig/20230104-142442-marostegui.json
14:24 marostegui: dbmaint deploy schema change on s7 eqiad T326227
14:22 XioNoX: fix inconsistent mtu on mr1-eqsin - T315838
14:15 XioNoX: fix inconsistent mtu on mr1-esams - T315838
14:14 marostegui: dbmaint deploy schema change on s7 eqiad T326228
14:13 marostegui: dbmaint deploy schema change on s7 eqiad T326226
14:11 marostegui: dbmaint deploy schema change on s8 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s7 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s6 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s5 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s4 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s3 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s2 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s1 eqiad T326221
14:10 marostegui: dbmaint deploy schema change on s7 eqiad T326225
14:10 marostegui: dbmaint deploy schema change on s7 T326225
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42780 and previous config saved to /var/cache/conftool/dbconfig/20230104-140936-marostegui.json
13:58 marostegui: dbmaint deploy schema change on s7 T326221
13:57 marostegui: dbmaint deploy schema change on s8 T326221
13:57 marostegui: dbmaint deploy schema change on s6 T326221
13:56 marostegui: dbmaint deploy schema change on s5 T326221
13:55 marostegui: dbmaint deploy schema change on s4 T326221
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42779 and previous config saved to /var/cache/conftool/dbconfig/20230104-135429-marostegui.json
13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42777 and previous config saved to /var/cache/conftool/dbconfig/20230104-133830-marostegui.json
13:33 XioNoX: fix missmatch MTU on pfw3-codfw - T315838
13:31 urbanecm: New wiki creation will run over by a couple of minutes
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42776 and previous config saved to /var/cache/conftool/dbconfig/20230104-132323-marostegui.json
13:15 XioNoX: fix missmatch MTU on cloudsw switches - T315838
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42775 and previous config saved to /var/cache/conftool/dbconfig/20230104-130816-marostegui.json
12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42774 and previous config saved to /var/cache/conftool/dbconfig/20230104-125330-root.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42773 and previous config saved to /var/cache/conftool/dbconfig/20230104-125310-marostegui.json
12:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:50 urbanecm@deploy1002: Started scap: Creating shnwikibooks (T321248)
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42772 and previous config saved to /var/cache/conftool/dbconfig/20230104-124424-marostegui.json
12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42771 and previous config saved to /var/cache/conftool/dbconfig/20230104-124403-marostegui.json
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42770 and previous config saved to /var/cache/conftool/dbconfig/20230104-123825-root.json
12:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
12:31 urbanecm@deploy1002: Started scap: Creating aswikiquote (T321246)
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42769 and previous config saved to /var/cache/conftool/dbconfig/20230104-122857-marostegui.json
12:27 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42767 and previous config saved to /var/cache/conftool/dbconfig/20230104-121350-marostegui.json
12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42766 and previous config saved to /var/cache/conftool/dbconfig/20230104-120815-root.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42765 and previous config saved to /var/cache/conftool/dbconfig/20230104-115844-marostegui.json
11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42764 and previous config saved to /var/cache/conftool/dbconfig/20230104-115310-root.json
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42763 and previous config saved to /var/cache/conftool/dbconfig/20230104-115011-marostegui.json
11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42761 and previous config saved to /var/cache/conftool/dbconfig/20230104-113805-root.json
11:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb2003.codfw.wmnet
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2151 to dbctl depooled T326206', diff saved to https://phabricator.wikimedia.org/P42759 and previous config saved to /var/cache/conftool/dbconfig/20230104-112801-marostegui.json
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42758 and previous config saved to /var/cache/conftool/dbconfig/20230104-112300-root.json
11:02 vgutierrez: testing HAProxy 2.4.20 in cp4037 and cp4045
10:56 vgutierrez: (apt1001) import HAproxy 2.4.20 from third-party repo for buster and bullseye
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 894 hosts
10:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 894 hosts
09:01 mlitn@deploy1002: mlitn and mlitn: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
09:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P42755 and previous config saved to /var/cache/conftool/dbconfig/20230104-082942-root.json
08:26 marostegui: dbmaint codfw deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T255174
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T255174
08:25 mlitn@deploy1002: mlitn and mlitn: Backport for Always show search results at full width (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:55 mutante: gitlab-runner* - correction: allowing connections TO kubestagemaster.svc.eqiad.wmnet port 6443 FROM trusted runners, of course - T325385
21:53 mutante: gitlab-runner* - allowing kubestagemaster.svc.eqiad.wmnet to connect to port 6443, run puppet via cumin, deploy gerrit:868737 - T325385
21:19 taavi@deploy1002: taavi and zabe: Backport for Start writing to cuc_comment_id on test wikis (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
15:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts graphite1004.eqiad.wmnet
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:48 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:07 oblivian@deploy1002: oblivian and oblivian: Backport for etcd: use the v3-style SRV record (T320397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:04 marostegui: Change x1 binlog format to STATEMENT T255174
11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
10:58 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
10:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
10:43 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
10:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1001.wikimedia.org
10:31 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1001.wikimedia.org
10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
09:27 vgutierrez: restarting varnish on cp5032 to clear VarnishChildRestarted alert - T325797
08:12 phedenskog@deploy1002: Started deploy [performance/navtiming@4f8c010]: (no justification provided)
08:05 kartik@deploy1002: kartik and kartik: Backport for Content Translation: Move ttwiki out of Beta (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet