Server Admin Log/Archive 64

2023-03-31

23:55 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus5002.eqsin.wmnet with reason: host reimage
23:52 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus5002.eqsin.wmnet with reason: host reimage
23:21 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus5002.eqsin.wmnet with OS bullseye
23:14 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus6002.drmrs.wmnet with OS bullseye
23:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus4002.ulsfo.wmnet with OS bullseye
23:02 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus5002.eqsin.wmnet
23:02 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
23:01 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
23:01 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus6002.drmrs.wmnet with reason: host reimage
22:58 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus4002.ulsfo.wmnet with reason: host reimage
22:57 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus6002.drmrs.wmnet with reason: host reimage
22:55 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus4002.ulsfo.wmnet with reason: host reimage
22:43 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus6002.drmrs.wmnet with OS bullseye
22:41 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus4002.ulsfo.wmnet with OS bullseye
22:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on miscweb[2002-2003].codfw.wmnet,miscweb[1002-1003].eqiad.wmnet with reason: maintenance
22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on miscweb[2002-2003].codfw.wmnet,miscweb[1002-1003].eqiad.wmnet with reason: maintenance
22:01 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
22:01 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
22:01 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:01 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
22:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
22:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1132.eqiad.wmnet
22:00 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
21:58 denisse@cumin1001: START - Cookbook sre.dns.netbox
21:58 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
21:57 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host prometheus6002.drmrs.wmnet with OS bullseye
21:52 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host prometheus4002.ulsfo.wmnet with OS bullseye
21:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
21:12 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5002
21:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:11 denisse@cumin1001: START - Cookbook sre.dns.netbox
21:07 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5002
21:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus5002.eqsin.wmnet
21:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
21:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
21:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
21:04 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
21:02 denisse@cumin1001: START - Cookbook sre.dns.netbox
21:02 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
21:02 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
21:02 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:02 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
21:00 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
20:58 denisse@cumin1001: START - Cookbook sre.dns.netbox
20:58 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
20:41 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus5002.eqsin.wmnet
20:41 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
20:41 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
20:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
20:39 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
20:38 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus6002.drmrs.wmnet with OS bullseye
20:38 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus4002.ulsfo.wmnet with OS bullseye
20:38 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus6002.drmrs.wmnet
20:38 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
20:37 denisse@cumin1001: START - Cookbook sre.dns.netbox
20:37 denisse@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
20:37 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus4002.ulsfo.wmnet
20:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
20:37 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
20:33 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
20:30 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
20:16 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
20:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
19:58 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus3002.esams.wmnet with OS bullseye
19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
19:45 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus3002.esams.wmnet with reason: host reimage
19:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
19:42 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus3002.esams.wmnet with reason: host reimage
19:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
19:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
19:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
19:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus6002.drmrs.wmnet on all recursors
19:37 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus6002.drmrs.wmnet on all recursors
19:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
19:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
19:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1152.eqiad.wmnet']
19:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:34 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus6002.drmrs.wmnet
19:33 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
19:33 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
19:33 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073.eqiad.wmnet']
19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
19:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
19:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073.eqiad.wmnet']
19:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
19:30 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
19:30 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus4002.ulsfo.wmnet on all recursors
19:30 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus4002.ulsfo.wmnet on all recursors
19:30 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:30 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
19:29 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
19:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
19:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
19:28 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
19:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:26 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
19:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:26 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus4002.ulsfo.wmnet
19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1154.eqiad.wmnet']
19:24 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus3002.esams.wmnet with OS bullseye
19:14 andrewbogott: upgraded wikitech-static to 1.39.3
19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1153.eqiad.wmnet']
19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153.eqiad.wmnet']
19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154.eqiad.wmnet']
18:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1155.eqiad.wmnet']
18:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1156.eqiad.wmnet']
18:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152.eqiad.wmnet']
18:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1151.eqiad.wmnet']
18:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155.eqiad.wmnet']
18:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156.eqiad.wmnet']
18:41 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus3002.esams.wmnet
18:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
18:40 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
18:23 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@30fae0e]: (no justification provided) (duration: 00m 20s)
18:23 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@30fae0e]: (no justification provided)
18:22 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@30fae0e]: bump discolytics to 0.12.0 (duration: 00m 20s)
18:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@30fae0e]: bump discolytics to 0.12.0
18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1004.eqiad.wmnet with OS bullseye
18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
17:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
17:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
17:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
17:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
17:40 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
17:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:39 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:36 denisse@cumin1001: START - Cookbook sre.dns.netbox
17:36 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
17:32 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus3002.esams.wmnet
17:32 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
17:27 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3002.esams.wmnet
17:23 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus3002.esams.wmnet
17:23 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
17:23 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
17:23 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:23 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:22 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:20 denisse@cumin1001: START - Cookbook sre.dns.netbox
17:20 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
17:20 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
17:20 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:20 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:19 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:18 aqu@deploy2002: Finished deploy [airflow-dags/analytics@9182e44]: Fix for VirtualPageview Dag - Analytics [airflow-dags@9182e44] (duration: 00m 11s)
17:18 aqu@deploy2002: Started deploy [airflow-dags/analytics@9182e44]: Fix for VirtualPageview Dag - Analytics [airflow-dags@9182e44]
17:17 denisse@cumin1001: START - Cookbook sre.dns.netbox
17:17 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
17:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@48778b4]: bump discolytics to 0.11.0 (duration: 00m 19s)
17:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@48778b4]: bump discolytics to 0.11.0
17:16 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus3002.esams.wmnet
17:16 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
17:16 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
17:16 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:16 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:15 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
17:13 denisse@cumin1001: START - Cookbook sre.dns.netbox
17:13 denisse@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
16:55 sukhe: restart pybal on lvs4008 to set it primary LVS for high-traffic1
16:54 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2aae7d0]: Fix for VirtualPageview Dag - Analytics [airflow-dags@2aae7d0] (duration: 00m 10s)
16:54 aqu@deploy2002: Started deploy [airflow-dags/analytics@2aae7d0]: Fix for VirtualPageview Dag - Analytics [airflow-dags@2aae7d0]
16:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:15 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
16:15 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
16:10 ladsgroup@deploy2002: Finished scap: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612) (duration: 08m 18s)
16:03 ladsgroup@deploy2002: matmarex and ladsgroup: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
16:02 ladsgroup@deploy2002: Started scap: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612)
16:00 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
16:00 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
15:49 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:49 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1014.eqiad.wmnet with OS bullseye
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:26 ladsgroup@deploy1002: Finished scap: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php""" (duration: 19m 14s)
15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:14 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php""" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
15:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
15:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:06 ladsgroup@deploy1002: Started scap: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php"""
15:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
14:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1014.eqiad.wmnet with OS bullseye
14:47 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host ms-fe1014.eqiad.wmnet
14:47 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
14:43 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe1014.eqiad.wmnet
14:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe1014.eqiad.wmnet
14:43 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe1014.eqiad.wmnet
14:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1013.eqiad.wmnet with OS bullseye
13:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
13:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
13:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
13:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
13:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
13:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
13:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
13:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
13:12 elukey: move kafka-jumbo1004's kafka broker cert to PKI - T296064
13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI
13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI
13:11 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:10 phedenskog@deploy2002: Finished deploy [performance/navtiming@c30b954]: (no justification provided) (duration: 00m 05s)
13:10 phedenskog@deploy2002: Started deploy [performance/navtiming@c30b954]: (no justification provided)
13:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:09 elukey: restart kafkatee on centrallog2002 - test to see if there are issues connecting to the jumbo brokers running pki
12:55 eoghan@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab
12:46 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
12:45 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:25 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:04 eoghan@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab
12:00 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab
11:42 Emperor: shutdown ms-be1042 for battery swap T332883
11:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW
11:41 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW
11:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151.eqiad.wmnet']
11:09 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab
11:08 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab
11:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
10:45 Amir1: Failover m1 from db1101 to db1164 - T333123
10:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
10:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149.eqiad.wmnet']
10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
10:25 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover
10:25 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover
10:18 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab
10:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:54 elukey: move kafka-jumbo1003's kafka broker cert to PKI - T296064
09:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance
09:54 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance
09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI
09:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI
09:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI
09:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI
09:02 elukey: move kafka-jumbo1002's kafka broker cert to PKI - T296064
08:47 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
08:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on an-worker1091.eqiad.wmnet with reason: Replacing battery
08:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on an-worker1091.eqiad.wmnet with reason: Replacing battery
08:32 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
08:27 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:25 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:25 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:14 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
06:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
06:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
06:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
06:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
06:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
06:43 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit1003.wikimedia.org with OS bullseye
01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:04 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
01:00 ejegg: payments-wiki upgraded from b5df483f to 60d0aed5
00:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1225.eqiad.wmnet with OS bullseye
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1224.eqiad.wmnet with OS bullseye
00:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
00:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: host reimage
00:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: host reimage
00:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149.eqiad.wmnet']
00:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1223.eqiad.wmnet with OS bullseye
00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1156.mgmt.eqiad.wmnet with reboot policy FORCED
00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1156.mgmt.eqiad.wmnet with reboot policy FORCED
00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
00:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1224.eqiad.wmnet with reason: host reimage
00:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
00:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1225.eqiad.wmnet with OS bullseye
00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bullseye
00:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1222.eqiad.wmnet with OS bullseye
00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:06 cstone: SmashPig upgraded from e86b0a66 to 7c19151f
00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1224.eqiad.wmnet with reason: host reimage
00:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
00:04 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
00:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:04 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
00:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:02 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"

2023-03-30

23:59 denisse@cumin1001: START - Cookbook sre.dns.netbox
23:59 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
23:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
23:59 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1154.mgmt.eqiad.wmnet with reboot policy FORCED
23:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1154.mgmt.eqiad.wmnet with reboot policy FORCED
23:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
23:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1224.eqiad.wmnet with OS bullseye
23:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1222.eqiad.wmnet with reason: host reimage
23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1222.eqiad.wmnet with reason: host reimage
23:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1152.mgmt.eqiad.wmnet with reboot policy FORCED
23:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1223.eqiad.wmnet with OS bullseye
23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1209.eqiad.wmnet with OS bullseye
23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1152.mgmt.eqiad.wmnet with reboot policy FORCED
23:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1151.mgmt.eqiad.wmnet with reboot policy FORCED
23:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
23:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1222.eqiad.wmnet with OS bullseye
23:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1151.mgmt.eqiad.wmnet with reboot policy FORCED
23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1220.eqiad.wmnet with OS bullseye
23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1221.eqiad.wmnet with OS bullseye
23:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1209.eqiad.wmnet with reason: host reimage
23:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
23:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
23:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1209.eqiad.wmnet with reason: host reimage
23:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bullseye
23:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1221.eqiad.wmnet with reason: host reimage
23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1221.eqiad.wmnet with reason: host reimage
23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1220.eqiad.wmnet with reason: host reimage
23:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1209.eqiad.wmnet with OS bullseye
23:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1220.eqiad.wmnet with reason: host reimage
22:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
22:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1221.eqiad.wmnet with OS bullseye
22:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1220.eqiad.wmnet with OS bullseye
22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1209']
22:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
22:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1209']
22:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
22:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db1209']
22:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1210']
21:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1209']
21:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1210']
21:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
21:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1225']
21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1218.eqiad.wmnet with OS bullseye
21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1219.eqiad.wmnet with OS bullseye
21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
21:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
20:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1225']
20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1219.eqiad.wmnet with reason: host reimage
20:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1218.eqiad.wmnet with reason: host reimage
20:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1219.eqiad.wmnet with reason: host reimage
20:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1218.eqiad.wmnet with reason: host reimage
20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1223']
20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1224']
20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1219.eqiad.wmnet with OS bullseye
20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1218.eqiad.wmnet with OS bullseye
20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS bullseye
20:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1216.eqiad.wmnet with OS bullseye
20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1224']
20:20 thcipriani@deploy2002: Finished scap: Backport for Remove inline script from United States static page (T331681) (duration: 09m 42s)
20:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1223']
20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
20:12 thcipriani@deploy2002: nray and thcipriani: Backport for Remove inline script from United States static page (T331681) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: host reimage
20:11 thcipriani@deploy2002: Started scap: Backport for Remove inline script from United States static page (T331681)
20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1216.eqiad.wmnet with reason: host reimage
20:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1221']
20:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1222']
19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS bullseye
19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1216.eqiad.wmnet with OS bullseye
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1214.eqiad.wmnet with OS bullseye
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1215.eqiad.wmnet with OS bullseye
19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1214.eqiad.wmnet with reason: host reimage
19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1215.eqiad.wmnet with reason: host reimage
19:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1214.eqiad.wmnet with reason: host reimage
19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1215.eqiad.wmnet with reason: host reimage
19:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1222']
19:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1221']
19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1220']
19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1219']
19:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1003.wikimedia.org with OS bullseye
19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1214.eqiad.wmnet with OS bullseye
19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1215.eqiad.wmnet with OS bullseye
19:08 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:08 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1213.eqiad.wmnet with OS bullseye
19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:04 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:04 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1220']
18:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1219']
18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS bullseye
18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:52 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:49 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
18:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
18:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
18:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
18:33 dduvall@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.2 refs T330208
18:32 SandraEbele: started Airflow mediwiki wikitext dags after killing oozie jobs as part of Migration task
18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1218']
18:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1217']
18:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:31 SandraEbele: Killed Oozie mediawiki-wikitext-history-coord and mediawiki-wikitext-current-coord
18:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1213.eqiad.wmnet with OS bullseye
18:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:23 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@5355ead]: (no justification provided) (duration: 00m 12s)
18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS bullseye
18:22 ebysans@deploy2002: Started deploy [airflow-dags/analytics@5355ead]: (no justification provided)
18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bullseye
18:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1211.eqiad.wmnet with OS bullseye
18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
18:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
18:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
17:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1211.eqiad.wmnet with reason: host reimage
17:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1211.eqiad.wmnet with reason: host reimage
17:49 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bullseye
17:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1211.eqiad.wmnet with OS bullseye
17:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1207.eqiad.wmnet with OS bullseye
17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
17:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1218']
17:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1217']
17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
17:30 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
17:29 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
17:28 SandraEbele: killed Oozie mediawiki-history-check_denormalize job and started Airflow mediawiki_history_check_denormalize dag.
17:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1216']
17:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1215']
17:27 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@8b242c2]: (no justification provided) (duration: 00m 11s)
17:27 ebysans@deploy2002: Started deploy [airflow-dags/analytics@8b242c2]: (no justification provided)
17:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bullseye
17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage
17:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage
17:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1216']
17:08 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:07 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:07 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1214']
17:06 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:05 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:04 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1215']
17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1213']
17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1207.eqiad.wmnet with OS bullseye
17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1214']
16:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1213']
16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1212']
16:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1211']
16:21 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2003-dev
16:20 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2003-dev
16:20 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2004-dev
16:20 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2004-dev
16:19 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2003-dev
16:19 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2003-dev
16:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1212']
16:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1211']
16:09 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1209']
16:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
16:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1208']
16:01 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2001-dev
16:01 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2001-dev
16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1207']
15:45 cstone: SmashPig upgraded from 240c80a2 to e86b0a66
15:44 mutante: phabricator maintenance window / deployment ended (T329974)
15:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1208']
15:36 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2001-dev
15:36 brennen@deploy2002: Finished deploy [phabricator/deployment@9f0866e]: deploy to phab1004 for T333516 (duration: 00m 42s)
15:36 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2001-dev
15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
15:35 brennen@deploy2002: Started deploy [phabricator/deployment@9f0866e]: deploy to phab1004 for T333516
15:34 brennen@deploy2002: Finished deploy [phabricator/deployment@9f0866e]: test deploy to phab2002 for T333516 (duration: 00m 30s)
15:34 volans: upgraded spicerack to v6.4.1 on the cumin hosts
15:34 brennen@deploy2002: Started deploy [phabricator/deployment@9f0866e]: test deploy to phab2002 for T333516
15:34 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2002-dev
15:33 mutante: phabricator maintenance / deploy window starting
15:33 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2002-dev
15:32 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2002-dev
15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: maintenance
15:32 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2002-dev
15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: maintenance
15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: maintenance
15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: maintenance
15:30 volans: uploaded spicerack_6.4.1 to apt.wikimedia.org bullseye-wikimedia
15:14 lucaswerkmeister-wmde:: Deployed security patch for T333569
15:08 lucaswerkmeister-wmde:: Deployed security patch for T333569
14:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
14:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
14:43 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:22 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:22 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
14:17 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
14:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
14:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
14:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:36 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
12:32 joal@deploy2002: Finished deploy [airflow-dags/analytics@a6500cf]: Regular analytics weekly train (2nd) HOTFIX [airflow-dags/analytics@a6500cf] (duration: 00m 11s)
12:31 joal@deploy2002: Started deploy [airflow-dags/analytics@a6500cf]: Regular analytics weekly train (2nd) HOTFIX [airflow-dags/analytics@a6500cf]
12:27 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
12:26 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:17 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
12:17 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
12:17 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
12:15 ladsgroup@deploy2002: Finished scap: Backport for Set externallinks to WRITE BOTH everywhere (T321662) (duration: 14m 58s)
12:08 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
12:02 ladsgroup@deploy2002: ladsgroup: Backport for Set externallinks to WRITE BOTH everywhere (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
12:00 ladsgroup@deploy2002: Started scap: Backport for Set externallinks to WRITE BOTH everywhere (T321662)
11:57 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:50 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker1149-56 - jclark@cumin1001"
11:49 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker1149-56 - jclark@cumin1001"
11:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
11:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for rest-gateway - hnowlan@cumin1001"
11:11 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for rest-gateway - hnowlan@cumin1001"
11:10 ladsgroup@deploy2002: Finished scap: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800) (duration: 07m 59s)
11:08 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
11:03 ladsgroup@deploy2002: ladsgroup: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:03 claime: Re-enabling puppet for cp-text - T331318
11:02 ladsgroup@deploy2002: Started scap: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800)
10:58 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
10:58 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P45994 and previous config saved to /var/cache/conftool/dbconfig/20230330-105011-ladsgroup.json
10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1136 T333538', diff saved to https://phabricator.wikimedia.org/P45993 and previous config saved to /var/cache/conftool/dbconfig/20230330-104928-ladsgroup.json
10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1181 to s7 primary T333538', diff saved to https://phabricator.wikimedia.org/P45992 and previous config saved to /var/cache/conftool/dbconfig/20230330-104617-ladsgroup.json
10:45 Amir1: Starting s7 eqiad failover from db1136 to db1181 - T333538
10:44 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
10:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P45989 and previous config saved to /var/cache/conftool/dbconfig/20230330-103506-ladsgroup.json
10:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
10:27 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1181 with weight 0 T333538', diff saved to https://phabricator.wikimedia.org/P45988 and previous config saved to /var/cache/conftool/dbconfig/20230330-102012-ladsgroup.json
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P45987 and previous config saved to /var/cache/conftool/dbconfig/20230330-102002-ladsgroup.json
10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T333538
10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T333538
10:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P45985 and previous config saved to /var/cache/conftool/dbconfig/20230330-100457-ladsgroup.json
09:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:48 joal@deploy2002: Finished deploy [airflow-dags/analytics@b7b41ae]: Regular analytics weekly train (2nd) [airflow-dags/analytics@b7b41ae] (duration: 00m 11s)
09:47 joal@deploy2002: Started deploy [airflow-dags/analytics@b7b41ae]: Regular analytics weekly train (2nd) [airflow-dags/analytics@b7b41ae]
09:44 claime: Re-enabling puppet for cp-text_ulsfo - T331318
09:36 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd] (hadoop-test): Regular analytics weekly train (2nd) TEST [analytics/refinery@359f4bd] (duration: 01m 28s)
09:35 joal@deploy2002: Started deploy [analytics/refinery@359f4bd] (hadoop-test): Regular analytics weekly train (2nd) TEST [analytics/refinery@359f4bd]
09:35 claime: Re-enabling puppet for cp4037 - T331318
09:34 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd] (thin): Regular analytics weekly train (2nd) THIN [analytics/refinery@359f4bd] (duration: 00m 08s)
09:34 joal@deploy2002: Started deploy [analytics/refinery@359f4bd] (thin): Regular analytics weekly train (2nd) THIN [analytics/refinery@359f4bd]
09:33 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd]: Regular analytics weekly train (2nd) [analytics/refinery@359f4bd] (duration: 05m 53s)
09:28 joal@deploy2002: Started deploy [analytics/refinery@359f4bd]: Regular analytics weekly train (2nd) [analytics/refinery@359f4bd]
09:23 claime: Re-enabling puppet for A:cp-upload - T331318
09:16 claime: Running puppet on cp2028.codfw.wmnet (cp-upload noop test) - T331318
09:15 claime: puppet disabled for A:cp-upload - T331318
09:12 claime: puppet disabled for A:cp-text - T331318
09:09 claime: Merging mw-on-k8s ATS lua routing script - T331318
09:04 godog: silence LogstashIndexingFailures during investigation T180051
08:55 elukey: move kafka main clusters to new truststore (PKI+Puppet root CA certs) - T319372
08:54 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
00:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
00:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
00:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
00:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
00:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
00:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED
00:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
00:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
00:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
00:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
00:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED

2023-03-29

23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
23:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache T324659
23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache
23:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
23:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
22:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
22:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
22:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
22:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
22:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
22:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
22:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
22:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
22:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
21:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
21:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
21:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
21:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
21:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
21:45 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads (duration: 00m 14s)
21:45 ebernhardson@deploy2002: Started deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads
21:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
21:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
21:15 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
20:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
20:29 taavi@deploy2002: Finished scap: Backport for Add per-action component-level profiling in statsd using excimer (T225968) (duration: 11m 52s)
20:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
20:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
20:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073']
20:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073']
20:18 taavi@deploy2002: aaron and taavi: Backport for Add per-action component-level profiling in statsd using excimer (T225968) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:17 taavi@deploy2002: Started scap: Backport for Add per-action component-level profiling in statsd using excimer (T225968)
20:15 taavi@deploy2002: Finished scap: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681) (duration: 09m 45s)
20:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
20:10 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
20:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
20:07 taavi@deploy2002: nray and taavi: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:06 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
20:06 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
20:05 taavi@deploy2002: Started scap: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681)
20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
20:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
20:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
19:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
19:50 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
19:48 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
19:20 sukhe: force puppet agent run on A:lvs to additionally confirm nothing broke
19:20 sukhe: [enable] puppet on A:lvs to roll out pybal prometheus-client change
19:14 sukhe: disable puppet on A:lvs to roll out pybal prometheus-client change
18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
18:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1138 T333480', diff saved to https://phabricator.wikimedia.org/P45981 and previous config saved to /var/cache/conftool/dbconfig/20230329-185431-ladsgroup.json
18:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary T333480', diff saved to https://phabricator.wikimedia.org/P45980 and previous config saved to /var/cache/conftool/dbconfig/20230329-185125-ladsgroup.json
18:50 Amir1: Starting s4 eqiad failover from db1138 to db1160 - T333480
18:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
18:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
18:45 dduvall@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.2 refs T330208 (duration: 05m 48s)
18:39 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.2 refs T330208
18:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:38 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3 (duration: 00m 16s)
18:38 ebernhardson@deploy2002: Started deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3
18:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
18:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
18:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T333480', diff saved to https://phabricator.wikimedia.org/P45979 and previous config saved to /var/cache/conftool/dbconfig/20230329-182536-ladsgroup.json
18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T333480
18:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T333480
18:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
18:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:16 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.2 refs T330208
18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
17:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
17:43 brett: Re-enable puppet on A:cp - T284555
17:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
17:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
17:29 brett: Disable puppet on A:cp to roll out another T284555
17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
17:18 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
17:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
17:11 brett: Re-enable puppet on A:cp - T284555
16:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
16:44 brett: Disable puppet on A:cp to roll out T284555
16:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
16:29 btullis@cumin1001: Added views for new wiki: anpwiki T332458
16:05 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
15:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
15:58 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:51 btullis@cumin1001: Added views for new wiki: gucwiki T326235
15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
15:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:28 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:27 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:27 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
15:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
15:27 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
15:26 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:07 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
15:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:01 jgleeson: SmashPig upgraded from 758a34c1 to 240c80a2
15:01 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive. (duration: 00m 13s)
15:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive.
14:59 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
14:58 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
14:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
14:57 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:49 XioNoX: Remove custom BGP graceful-shutdown on all core routers - T320230
14:47 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
14:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
14:15 Lucas_WMDE: UTC afternoon backport+config window done
14:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) (duration: 07m 30s)
14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
14:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
14:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
14:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
14:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)
14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
14:04 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) (duration: 08m 02s)
14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
14:00 XioNoX: merge/deploy change in Puppet's modules/network/data/data.yaml - T327930
13:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:56 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)
13:56 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:54 jgiannelos@deploy2002: Finished deploy [restbase/deploy@0d2f12f]: (no justification provided) (duration: 17m 59s)
13:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:46 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:42 elukey: run dist-upgrade on kafka-main2002 to upgrade it to bullseye - T332013
13:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript cleanupTitles.php gurwiki # T332241 (2 of 767 rows updated)
13:37 sukhe: enable puppet on A:lvs to test Python 2 deprecation change: T321309
13:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@0d2f12f]: (no justification provided)
13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php gurwiki --fix # T332241 – 0 pages to fix (0 resolvable), 0 links to fix (0 resolvable, 0 deleted)
13:30 XioNoX: enable vcp-snmp-statistics on fasw-c-codfw
13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enabled native gallery editing in Parsoid (T329662) (duration: 10m 19s)
13:29 sukhe: disable puppet on A:lvs to test Python 2 deprecation change: T321309
13:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and arlolra: Backport for Enabled native gallery editing in Parsoid (T329662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:19 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enabled native gallery editing in Parsoid (T329662)
13:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable history page visual diffs on remaining wikis (T314588) (duration: 08m 23s)
13:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:10 dcausse@deploy2002: Finished deploy [airflow-dags/search@92e9876]: (no justification provided) (duration: 00m 14s)
13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Enable history page visual diffs on remaining wikis (T314588) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:09 dcausse@deploy2002: Started deploy [airflow-dags/search@92e9876]: (no justification provided)
13:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable history page visual diffs on remaining wikis (T314588)
13:01 XioNoX: test enabling lldp on mr1-ulsfo
12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:55 XioNoX: test enabling lldp on pfw3-codfw
12:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
12:43 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
12:22 btullis@cumin1001: Added views for new wiki: gurwiki T327841
11:57 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:55 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
11:55 btullis@cumin1001: Added views for new wiki: shnwikivoyage T302798
11:55 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:54 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
11:54 btullis@cumin1001: Added views for new wiki: guwwiktionary T309056
11:54 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:53 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
11:53 btullis@cumin1001: Added views for new wiki: guwwiki T303761
11:53 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
11:51 btullis@cumin1001: Added views for new wiki: kcgwiki T305280
11:51 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:18 jgiannelos@deploy2002: deploy aborted: (no justification provided) (duration: 00m 01s)
11:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@c265f3f] (beta): (no justification provided)
11:12 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
11:07 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
10:58 claime: authdns-update successful on all nodes - T333120
10:57 claime: Running authdns-update
10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int,name=codfw
10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro
10:52 claime: Running puppet on dns-auth - T333120
10:50 claime: Switching mw-api-int to production - T333120
10:50 claime: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
10:46 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
10:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)
10:41 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)
10:37 claime: Switching mw-api-int to lvs_setup - T333120
10:21 hnowlan@deploy2002: Finished deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki T332093 T332379 (duration: 19m 30s)
10:02 hnowlan@deploy2002: Started deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki T332093 T332379
09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:58 claime: running puppet on O:kubernetes::worker and O:lvs::balancer - T333120
09:58 denisse: updating prometheus3001 to bullseye
09:57 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:57 claime: Adding mw-api-int to service_catalog in service_setup - T333120
09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:54 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
09:33 filippo@deploy2002: Finished scap: Backport for Revert "Failover statsd to graphite2004" (duration: 07m 34s)
09:27 filippo@deploy2002: filippo: Backport for Revert "Failover statsd to graphite2004" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
09:26 filippo@deploy2002: Started scap: Backport for Revert "Failover statsd to graphite2004"
09:02 elukey: move kafka on kafka-jumbo1001 to PKI TLS certs - T296064
09:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
08:03 volans: installed spicerack v6.4.0 on cumin1001
07:37 kartik@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995) (duration: 12m 35s)
07:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
07:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
07:31 oblivian@deploy2002: Finished deploy [restbase/deploy@11477d6]: Updating stale nodes, T333069 (duration: 32m 07s)
07:27 volans: installed spicerack v6.4.0 on cumin2002
07:26 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:25 kartik@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995)
07:07 slyngs: Update Squid logformat (urldownloader[1001-1002,2001-2002,2004].wikimedia.org)
06:59 oblivian@deploy2002: Started deploy [restbase/deploy@11477d6]: Updating stale nodes, T333069
06:47 hashar: Restarted Gerrit
06:43 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 10s)
06:43 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
06:42 hashar: gerrit2002: restarted Gerrit replica instance
06:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 06s)
06:40 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
06:38 phedenskog@deploy2002: Finished deploy [performance/navtiming@f6c9fa3]: (no justification provided) (duration: 00m 05s)
06:38 phedenskog@deploy2002: Started deploy [performance/navtiming@f6c9fa3]: (no justification provided)
06:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
06:21 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
00:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet
00:37 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet
00:30 sukhe: restart pybal on lvs1018 to hopefully resolve flapping BGP session
00:06 zabe@deploy2002: Finished scap: Backport for throttle: Remove expired throttle (duration: 07m 19s)
00:00 zabe@deploy2002: zabe: Backport for throttle: Remove expired throttle synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet

2023-03-28

23:59 zabe@deploy2002: Started scap: Backport for throttle: Remove expired throttle
23:46 zabe@deploy2002: Finished scap: T331831 (duration: 06m 50s)
23:39 zabe@deploy2002: Started scap: T331831
23:34 zabe@deploy2002: Finished scap: T331831 (duration: 07m 01s)
23:27 zabe@deploy2002: Started scap: T331831
23:27 zabe: central Kurdish Wiktionary (ckbwiktionary)
22:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
22:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
22:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
22:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
21:44 eileen: civicrm upgraded from db3b727e to 183d131d
21:23 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments (duration: 00m 13s)
21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments
21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
21:06 urandom: updating image_suggestions default table TTL(s) from 1209600 to 1814400 (seconds) — T333319
21:05 phedenskog@deploy2002: Finished deploy [performance/navtiming@4d22874]: (no justification provided) (duration: 00m 06s)
21:05 phedenskog@deploy2002: Started deploy [performance/navtiming@4d22874]: (no justification provided)
21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
21:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
21:03 urbanecm@deploy2002: Finished scap: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733) (duration: 28m 53s)
20:51 urbanecm@deploy2002: urbanecm and matmarex: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:34 urbanecm@deploy2002: Started scap: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733)
20:27 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes (duration: 00m 13s)
20:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes
20:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
20:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
20:02 ejegg: payments-wiki upgraded from f5ec2677 to b5df483f
19:29 dduvall@deploy2002: Pruned MediaWiki: 1.40.0-wmf.27 (duration: 02m 11s)
19:26 dduvall@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs T330208 (duration: 07m 24s)
19:19 dduvall@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
18:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:37 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance (duration: 00m 20s)
18:36 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance
18:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
18:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=ats-be
16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=cdn
16:52 volans: uploaded spicerack_6.4.0 to apt.wikimedia.org bullseye-wikimedia (but I'll deploy it to the cumin hosts tomorrow)
16:10 jnuche@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs T330208 (duration: 49m 52s)
16:09 bblack: reboot cp1082 (NIC issues)
16:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=ats-be
16:03 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=cdn
16:00 inflatador: bking@cumin1001 unban elastic and cloudelastic nodes post maintenance T330165
15:57 btullis@deploy2002: Finished deploy [analytics/refinery@6554ec0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6554ec0] (duration: 01m 32s)
15:20 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
15:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:15 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:14 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:08 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-client1002.eqiad.wmnet with OS bullseye
15:05 jnuche@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.41.0-wmf.2" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 03s)
15:05 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
14:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=eqiad
14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=device-analytics,name=pki
14:53 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=device-analytics,name=eqiad
14:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=device-analytics
14:51 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row B switches upgrade done - T330165
14:48 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
14:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor100[12].eqiad.wmnet
14:38 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
14:32 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches upgrade done - T330165
14:31 sukhe: run authdns-update to revert eqiad depool
14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=THANOS-FE-OLD-FQDN,service=thanos-web
14:05 XioNoX: reboot eqiad row B for upgrade - T330165
13:58 godog: depool thanos-fe1002 - T330165
13:54 Emperor: depool ms-fe1010 before switch work T330165
13:53 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
13:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
13:47 akosiaris: depool swift in eqiad for row B upgrade
13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
13:46 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
13:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:45 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:44 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
13:41 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
13:36 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:34 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor,name=eqiad
13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1002.eqiad.wmnet
13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
13:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
13:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row B switches upgrade - T330165
12:59 XioNoX: depool eqiad for network maintenance - T330165
12:58 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches upgrade - T330165
12:57 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
12:56 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
12:56 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
12:56 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
12:44 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
12:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
12:43 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
12:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
12:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
12:36 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye
12:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
12:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
12:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
12:21 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
12:20 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
12:20 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
12:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295
12:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 45295
12:09 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye
11:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
11:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
11:56 elukey: dist-upgrade kafka-main1002 to debian bullseye - T332013
11:51 ladsgroup@deploy2002: Finished scap: Backport for api: Mark query as read-only to avoid regex on SQL (T332942) (duration: 18m 42s)
11:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:37 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:34 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:34 ladsgroup@deploy2002: ladsgroup: Backport for api: Mark query as read-only to avoid regex on SQL (T332942) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:32 ladsgroup@deploy2002: Started scap: Backport for api: Mark query as read-only to avoid regex on SQL (T332942)
11:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:22 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:24 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:24 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
10:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
09:56 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
09:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
09:41 vgutierrez: resetting cp2035 management card - T333312
09:38 elukey: dist-upgrade kafka-main1001 to bullseye - T332013
09:36 godog: silence systemdunitfailed alerts for team=wmcs - T333315
09:35 vgutierrez: depool cp2035 - T333312
09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
09:12 jbond@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts
09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts
09:11 jbond@cumin1001: END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
08:58 vgutierrez: restart ipmiseld on cp2035
08:50 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
08:49 ayounsi@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:48 AndyRussG: update payments.wiki config 65bedd4a -> e31ffd7d, payments (automatic updates only) a6c6c2b1 -> f5ec2677
08:45 ayounsi@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:43 ayounsi@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:42 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
08:39 ayounsi@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:37 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:35 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:34 ayounsi@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
08:32 ayounsi@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
08:32 phedenskog@deploy2002: Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s)
08:32 phedenskog@deploy2002: Started deploy [performance/navtiming@e757bdf]: (no justification provided)
08:31 ayounsi@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
08:29 ayounsi@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
08:25 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:21 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
08:14 ayounsi@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
08:11 oblivian@deploy2002: Finished scap: Backport for Failover statsd to graphite2004 (T330165) (duration: 08m 48s)
08:08 ayounsi@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
08:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance
08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance
08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance
08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance
08:04 oblivian@deploy2002: oblivian and filippo: Backport for Failover statsd to graphite2004 (T330165) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
08:03 ayounsi@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
08:02 oblivian@deploy2002: Started scap: Backport for Failover statsd to graphite2004 (T330165)
08:02 ayounsi@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:00 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:00 godog: move graphite reads to codfw - T330165
07:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:56 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:54 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:54 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:51 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:51 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45965 and previous config saved to /var/cache/conftool/dbconfig/20230328-073122-root.json
07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17806
07:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 17806
07:20 kartik@deploy2002: Finished scap: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834) (duration: 12m 05s)
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45964 and previous config saved to /var/cache/conftool/dbconfig/20230328-071617-root.json
07:10 kartik@deploy2002: kartik: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
07:08 kartik@deploy2002: Started scap: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834)
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45963 and previous config saved to /var/cache/conftool/dbconfig/20230328-070112-root.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45962 and previous config saved to /var/cache/conftool/dbconfig/20230328-064607-root.json
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45961 and previous config saved to /var/cache/conftool/dbconfig/20230328-063103-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45960 and previous config saved to /var/cache/conftool/dbconfig/20230328-061558-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 T329481', diff saved to https://phabricator.wikimedia.org/P45959 and previous config saved to /var/cache/conftool/dbconfig/20230328-061441-root.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P45958 and previous config saved to /var/cache/conftool/dbconfig/20230328-060053-root.json
05:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:53 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
05:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
05:47 AndyRussG: update payments-wiki f5e262d1 -> a6c6c2b1
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P45957 and previous config saved to /var/cache/conftool/dbconfig/20230328-054548-root.json
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P45956 and previous config saved to /var/cache/conftool/dbconfig/20230328-053043-root.json
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45955 and previous config saved to /var/cache/conftool/dbconfig/20230328-051539-root.json
01:59 krinkle@deploy2002: Synchronized wmf-config/mc.php: I44edcd (duration: 06m 33s)

2023-03-27

23:47 mutante: people1003 - taking down apache to provoke monitoring alert (inactive instances) and confirm IRC alerting change works
23:31 zabe: deployed patch for T330968
23:08 zabe@deploy2002: Finished scap: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514) (duration: 21m 27s)
23:00 zabe@deploy2002: zabe: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:48 mutante: stat1005 - kill 18179; run puppet ; stat1007 - kill 3346; run puppet ; stat1006 - kill 23887 run puppet
22:47 zabe@deploy2002: Started scap: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514)
22:43 mutante: stat1004 - kill 29291; run puppet
22:43 mutante: apt2001 - kill 3105; run puppet
22:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Meta:WMF Support and Safety" "Meta:WMF Trust and Safety" "Zabe" --reason "per T330514" # T330514
21:58 maryum: Deploy security fix for T326952
21:58 urandom: power cycling restbase1033 — T333243
21:45 ryankemper: T330165 Depooled relevant search platform hosts: `sudo -E cumin 'elastic[1055-1056,1074-1079,1085-1086]*,cloudelastic100[2,6]*,wcqs1002*,wdqs[1007,1012]*' 'sudo depool'`
21:24 Amir1: start of watchlist clean up in arwiki (T328501)
21:23 kindrobot: finish UTC late backports
21:22 kindrobot@deploy2002: Finished scap: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279) (duration: 08m 26s)
21:15 kindrobot@deploy2002: kindrobot and superpes: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:15 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided) (duration: 00m 13s)
21:14 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided)
21:14 kindrobot@deploy2002: Started scap: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279)
21:11 tzatziki: moving Universal Code of Conduct/Enforcement guidelines -> Universal Code of Conduct/Enforcement guidelines/Version 1 on metawiki with `extensions/Translate/scripts/moveTranslatableBundle.php `
20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1022.eqiad.wmnet
20:45 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:45 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:41 andrew@cumin1001: START - Cookbook sre.dns.netbox
20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1022.eqiad.wmnet
20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1021.eqiad.wmnet
20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:33 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:31 andrew@cumin1001: START - Cookbook sre.dns.netbox
20:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1021.eqiad.wmnet
20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1017.eqiad.wmnet
20:25 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:25 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:23 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
20:21 andrew@cumin1001: START - Cookbook sre.dns.netbox
20:20 kindrobot@deploy2002: Finished scap: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093) (duration: 10m 50s)
20:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1017.eqiad.wmnet
20:11 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:10 kindrobot@deploy2002: Started scap: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)
20:01 kindrobot: start UTC late backport window
19:21 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2 (duration: 00m 14s)
19:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2
19:06 jgleeson: civicrm upgraded from 09373b9d to db3b727e
16:40 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:40 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:39 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:39 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:34 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:34 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:33 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:25 jgleeson: payments-wiki upgraded from 36366f64 to f5e262d1
15:55 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided) (duration: 00m 11s)
15:54 ebysans@deploy2002: Started deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided)
15:20 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
15:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
15:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
15:17 elukey@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 10s)
15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict1002.eqiad.wmnet
14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict1002.eqiad.wmnet on all recursors
14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict1002.eqiad.wmnet on all recursors
14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
14:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
14:52 eoghan@cumin1001: START - Cookbook sre.dns.netbox
14:52 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host aphlict1002.eqiad.wmnet
14:48 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:48 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:47 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
14:47 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
14:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:44 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:40 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
14:29 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
14:29 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
14:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
14:28 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
14:27 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:16 taavi: taavi@mwmaint2002 ~ $ mwscript namespaceDupes.php --wiki=huwiki --fix # T333083
14:15 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:15 taavi@deploy2002: Finished scap: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166) (duration: 08m 27s)
14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:14 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:08 taavi@deploy2002: taavi: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:06 taavi@deploy2002: Started scap: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166)
13:35 fab@deploy2002: Finished deploy [airflow-dags/research@d2c115d]: (no justification provided) (duration: 00m 21s)
13:35 fab@deploy2002: Started deploy [airflow-dags/research@d2c115d]: (no justification provided)
13:12 taavi@deploy2002: Finished scap: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083) (duration: 08m 45s)
13:04 taavi@deploy2002: superpes and taavi: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:03 taavi@deploy2002: Started scap: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083)
12:42 godog: flip alert* to overlay2 - T329939
11:55 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
10:31 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:30 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:10 elukey: dist-upgrade kafka-main1003 manually to bullseye - T332013
10:03 Emperor: depool ms-fe2009
09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
09:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45295
09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45295
09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:39 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
08:57 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
08:55 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
08:47 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
08:39 ladsgroup@deploy1002: Finished scap: Backport for EntityUsageTable: Mark query as read-only (T332941) (duration: 18m 15s)
08:30 ladsgroup@deploy1002: ladsgroup: Backport for EntityUsageTable: Mark query as read-only (T332941) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:28 jynus: restarting bacula at backup1001 T331510
08:25 urbanecm@deploy2002: Synchronized wmf-config/InitialiseSettings.php: 63dd23b: [Growth] eswiki: Enable mentorship for 50% of newcomers (T332737, T285235) (duration: 06m 09s)
08:21 ladsgroup@deploy1002: Started scap: Backport for EntityUsageTable: Mark query as read-only (T332941)
08:18 urbanecm@deploy2002: Backport cancelled.
08:06 urbanecm@deploy2002: Finished scap: Backport for GrowthMentors.json: Add a write-only username field (T331444) (duration: 07m 52s)
08:03 marostegui: Failover m1 from db1164 to db1101 - T331510
08:00 urbanecm@deploy2002: urbanecm: Backport for GrowthMentors.json: Add a write-only username field (T331444) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
07:58 urbanecm@deploy2002: Started scap: Backport for GrowthMentors.json: Add a write-only username field (T331444)
07:55 urbanecm@deploy2002: Finished scap: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075) (duration: 16m 45s)
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45949 and previous config saved to /var/cache/conftool/dbconfig/20230327-075206-root.json
07:48 urbanecm@deploy2002: urbanecm: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:39 jynus: disabling puppet and shutding down bacula at backup1001 T331510
07:38 urbanecm@deploy2002: Started scap: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45948 and previous config saved to /var/cache/conftool/dbconfig/20230327-073701-root.json
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45947 and previous config saved to /var/cache/conftool/dbconfig/20230327-072156-root.json
07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45946 and previous config saved to /var/cache/conftool/dbconfig/20230327-070651-root.json
06:51 marostegui: dbmaint s3 eqiad Rename flaggedrevs tables on db1123 ptwikisource T332594
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45945 and previous config saved to /var/cache/conftool/dbconfig/20230327-065147-root.json
06:40 marostegui: Rename flaggedrevs tables on db1123 ptwikisource T332594
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45944 and previous config saved to /var/cache/conftool/dbconfig/20230327-063642-root.json
05:40 kart_: Updated cxserver to 2023-03-17-133444-production (T332379 + build changes)
05:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:37 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:24 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:23 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T332292', diff saved to https://phabricator.wikimedia.org/P45942 and previous config saved to /var/cache/conftool/dbconfig/20230327-051941-root.json
05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510
05:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510

2023-03-25

07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s)
07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0
00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host
00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer
00:09 tzatziki: removing 2 files for legal compliance

2023-03-24

23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
23:50 tzatziki: removing 1 file for legal compliance
21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after I4fe17f which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. T328907
18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s)
18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable
18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s)
18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags
18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s)
18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag
17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s)
17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag
15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request T332917" # T332917
11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:00 marostegui: Upgrade db1204 to mariadb 10.6 T330861
08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key T332972
05:37 hashar: gerrit: refreshed ssh host key for `github.com`
05:28 hashar: Restarted Gerrit
05:26 hashar: Stopping Gerrit
05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068) (duration: 00m 10s)
05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068)
05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org
05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068) (duration: 00m 07s)
05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068)
05:17 hashar: Restarting Gerrit for deploying plugins updates
05:10 ejegg: Standalone SmashPig upgraded from 3b84e4cb to 50139e82
05:04 ejegg: payments-wiki upgraded from 4d0c90b4 to 4b0a71fa
00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply

2023-03-23

22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
22:30 mutante: moscovium - rebooting to finalize distro release upgrade - T332952
22:20 mutante: moscovium performing apt-get full-upgrade T332952
22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - T327068
22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) T327068
22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) T327068
21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
20:33 taavi@deploy2002: Finished scap: Backport for MessageWebImporter: Use translation instead of language code on import (T323430) (duration: 10m 56s)
20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
20:24 taavi@deploy2002: abi and taavi: Backport for MessageWebImporter: Use translation instead of language code on import (T323430) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:23 taavi@deploy2002: Started scap: Backport for MessageWebImporter: Use translation instead of language code on import (T323430)
19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1 refs T330207
17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
17:38 mutante: moscovium - systemctl stop rsync
17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
16:59 sukhe: rolling out CR 901333 to A:cp-text T313578
16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - T332796
15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
13:46 TheresNoTime: close UTC afternoon backport window
13:45 samtar@deploy2002: Finished scap: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759) (duration: 07m 46s)
13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
13:39 samtar@deploy2002: samtar: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:37 samtar@deploy2002: Started scap: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)
13:36 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: disable add a link backend (T304551) (duration: 08m 05s)
13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
13:29 samtar@deploy2002: samtar and sgimeno: Backport for GrowthExperiments: disable add a link backend (T304551) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:28 samtar@deploy2002: Started scap: Backport for GrowthExperiments: disable add a link backend (T304551)
13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` T332470
13:25 samtar@deploy2002: Finished scap: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470) (duration: 08m 39s)
13:18 samtar@deploy2002: samtar and superpes: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:16 samtar@deploy2002: Started scap: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470)
13:15 samtar@deploy2002: Finished scap: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813) (duration: 11m 47s)
13:08 samtar@deploy2002: samtar and superpes: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:03 samtar@deploy2002: Started scap: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)
12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - T332796
11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia T332584 T332589
08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - T332803
08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - T332796
08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - T332796
07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - T332803
07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - T332803
07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for T331896 - no automatic sync / deploy for these
01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - T332812"
01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - T332812"
00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye

2023-03-22

23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # T332782
23:31 zabe@deploy2002: Finished scap: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782) (duration: 10m 03s)
23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
23:22 zabe@deploy2002: zabe: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
23:21 zabe@deploy2002: Started scap: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782)
21:15 taavi: UTC late backports complete
21:13 taavi@deploy2002: Finished scap: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031) (duration: 07m 29s)
21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
21:08 taavi@deploy2002: taavi: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:06 taavi@deploy2002: Started scap: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)
21:05 taavi@deploy2002: Finished scap: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS (duration: 07m 17s)
20:59 taavi@deploy2002: taavi: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:58 taavi@deploy2002: Started scap: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS
20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable page tools for anonymous users (T331052) (duration: 10m 10s)
20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ T332803
20:36 samtar@deploy2002: Finished scap: Backport for Enable pinning for anon main menu when page tools is enabled (T331657) (duration: 11m 47s)
20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ T332803
20:32 akosiaris: reboot kubernetes1023 for a test once more
20:28 samtar@deploy2002: samtar and nray: Backport for Enable pinning for anon main menu when page tools is enabled (T331657) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:25 akosiaris: reboot kubernetes1023 for a test
20:24 samtar@deploy2002: Started scap: Backport for Enable pinning for anon main menu when page tools is enabled (T331657)
20:23 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813) (duration: 09m 57s)
20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:15 samtar@deploy2002: kharlan and samtar: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:13 samtar@deploy2002: Started scap: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)
20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
20:09 samtar@deploy2002: Finished scap: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config (duration: 07m 22s)
20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:04 samtar@deploy2002: samtar and matmarex: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
20:02 samtar@deploy2002: Started scap: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config
20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1 refs T330207
18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ T331896 - but also see T332101
17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # T332121 (duration: 06m 44s)
17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - T332796
16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia T332584 T332589
15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
15:22 hnowlan: removing java packages from maps hosts
15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
15:13 hnowlan: removing cassandra packages from maps hosts
15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) T323961
11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted T323961', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
00:50 zabe@deploy2002: Started scap: update interwiki cache
00:47 zabe@deploy2002: Finished scap: T332115 (duration: 06m 56s)
00:40 zabe@deploy2002: Started scap: T332115
00:40 zabe: create Wikipedia Angika (anpwiki) # T332115
00:38 zabe@deploy2002: Finished scap: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118) (duration: 27m 00s)
00:29 zabe@deploy2002: zabe: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
00:11 zabe@deploy2002: Started scap: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118)

2023-03-21

23:46 zabe@deploy2002: Finished scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) (duration: 30m 08s)
23:35 zabe@deploy2002: zabe: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
23:15 zabe@deploy2002: Started scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)
23:07 zabe@deploy2002: Finished scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers" (duration: 07m 10s)
23:00 zabe@deploy2002: Started scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers"
22:47 ejegg: payments-wiki upgraded from 0fd66b1f to ab0a55a2
22:10 urbanecm@deploy2002: Finished scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) (duration: 07m 15s)
22:04 urbanecm@deploy2002: urbanecm: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:03 urbanecm@deploy2002: Started scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)
21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
21:02 AndyRussG: update SmashPig config 6e651fd4 -> 035f602a
20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
20:48 taavi: start T315510 migration script on group2 s7 wikis
20:39 taavi@deploy2002: Finished scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config (duration: 09m 01s)
20:31 taavi@deploy2002: matmarex and taavi: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:30 taavi@deploy2002: Started scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config
20:20 taavi@deploy2002: Finished scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) (duration: 17m 40s)
20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:02 taavi@deploy2002: Started scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)
19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1 refs T330207
18:00 AndyRussG: update SmashPig config 59a8b2d2 -> 6e651fd
17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:42 jbond: stop puppet from deploying this further
15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
15:26 samtar@deploy2002: Finished scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) (duration: 09m 11s)
15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:19 samtar@deploy2002: samtar: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:17 samtar@deploy2002: Started scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)
15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
15:10 samtar@deploy2002: Finished scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) (duration: 09m 32s)
15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
15:02 samtar@deploy2002: samtar: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
15:00 samtar@deploy2002: Started scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)
14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
14:38 hnowlan: disabling puppet on maps* before merging 760619
14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:10 urbanecm@deploy2002: Finished scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443) (duration: 07m 53s)
14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
14:02 urbanecm@deploy2002: Started scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)
14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - T319372
11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
11:08 joal: Kill mediacounts_load oozie job
11:07 joal: Unpause mediawiki_history_denormalize airflow job
11:06 joal: Kill mediawiki_denormalize oozie job
11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372
06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1 refs T330207 (duration: 52m 38s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1 refs T330207

2023-03-20

22:00 samtar@deploy2002: Finished scap: Backport for Add languages to Minerva HTML (T331905) (duration: 09m 45s)
21:52 samtar@deploy2002: jdlrobson and samtar: Backport for Add languages to Minerva HTML (T331905) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:50 samtar@deploy2002: Started scap: Backport for Add languages to Minerva HTML (T331905)
21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` T332614
21:25 TheresNoTime: closing UTC late backport window, extended
21:22 samtar@deploy2002: Finished scap: Backport for Rename project and project talk namespace for shwiki (T332614) (duration: 12m 22s)
21:11 samtar@deploy2002: samtar and aleksandar: Backport for Rename project and project talk namespace for shwiki (T332614) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:10 samtar@deploy2002: Started scap: Backport for Rename project and project talk namespace for shwiki (T332614)
21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
21:09 samtar@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) (duration: 08m 34s)
21:02 samtar@deploy2002: matmarex and samtar: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:00 TheresNoTime: extending UTC late backport window
21:00 samtar@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407)
20:58 kharlan@deploy2002: Finished scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) (duration: 10m 28s)
20:49 kharlan@deploy2002: kharlan: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
20:47 kharlan@deploy2002: Started scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309)
19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) T257317 T332623 T331896
19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
18:56 ejegg: switched back to new PayPal pending transaction resolver
18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
18:47 akosiaris: emergency rollover of redis password complete
18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
18:42 ejegg: civicrm upgraded from 3d3606f1 to 09373b9d
18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost (T331896)
18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP (T196968)
17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:11 TheresNoTime: close UTC afternoon backport window
14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` T331762
14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` T331762
14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` T332351
14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` T331762
14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") T331762
14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` T331762
13:58 samtar@deploy2002: Finished scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) (duration: 09m 44s)
13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:49 samtar@deploy2002: Started scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762)
13:47 samtar@deploy2002: Finished scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) (duration: 09m 26s)
13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
13:39 samtar@deploy2002: aleksandar and samtar: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:38 samtar@deploy2002: Started scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468)
13:37 samtar@deploy2002: Finished scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) (duration: 08m 46s)
13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843 (duration: 01m 30s)
13:29 samtar@deploy2002: stang and samtar: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843
13:28 samtar@deploy2002: Started scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439)
13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843 (duration: 01m 39s)
13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843
13:18 samtar@deploy2002: Finished scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) (duration: 11m 36s)
13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:08 samtar@deploy2002: stang and samtar: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:06 samtar@deploy2002: Started scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351)
11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
09:21 claime: Repooling parse2004 - T332119
08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915

2023-03-19

18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) 27a5b481 -> 6359222d
16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) 5dd37c9c -> 3d3606f1
14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)

2023-03-18

22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
14:26 apergos: rsync of xmldata public dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
01:21 urandom: powercycling restbase2025 — T332462
00:06 AndyRussG: Updating civicrm from 5dd37c9c to 3d3606f1

2023-03-17

19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - T332119
12:06 mutante: systemct-reset failed on gitlab-runner*
11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
02:10 ejegg: civicrm upgraded from 672950d9 to 5dd37c9c
01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates

2023-03-16

23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
22:24 ejegg: civicrm upgraded from 68fa85cf to 672950d9
22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27 refs T330205
20:36 brennen: 1.40.0-wmf.27 train (T330205): blockers hopefully resolved, rolling to all wikis
20:35 TheresNoTime: close UTC late backport window
20:35 samtar@deploy2002: Finished scap: Backport for Remove sampling from breadCrumbs schema (duration: 08m 18s)
20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for Remove sampling from breadCrumbs schema synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:26 samtar@deploy2002: Started scap: Backport for Remove sampling from breadCrumbs schema
20:21 brennen@deploy2002: Finished scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) (duration: 09m 06s)
20:14 brennen@deploy2002: brennen and jforrester: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:12 brennen@deploy2002: Started scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)
19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
18:41 wfan: enable monthlyconvert for cz
18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - T326363
15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - T326363
15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:06 urandom: ALTER-ing image_suggestions.suggestion table — T328670
13:35 kostajh: UTC afternoon deploys done
13:34 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag (duration: 07m 44s)
13:28 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:27 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag
13:15 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) (duration: 09m 48s)
13:07 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:05 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813)
12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
08:40 kostajh: UTC morning deploys (second round) done
08:40 kharlan@deploy2002: Finished scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) (duration: 12m 30s)
08:29 kharlan@deploy2002: kharlan: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:27 kharlan@deploy2002: Started scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227)
08:11 apergos: additional deployments for the UTC morning backport and config training window, running into the next hour, so window re-opened
07:36 tgr_: UTC morning deploys done
07:34 tgr@deploy2002: Finished scap: Backport for Leveling up: Backport recent changes (duration: 08m 13s)
07:28 tgr@deploy2002: tgr: Backport for Leveling up: Backport recent changes synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
07:26 tgr@deploy2002: Started scap: Backport for Leveling up: Backport recent changes
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl T331874', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
06:03 marostegui: Failover m5 from db1106 to db1176 - T332155
05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T332155
05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T332155
03:29 ejegg: payments-wiki upgraded from 1532b107 to 0fd66b1f

2023-03-15

22:55 tzatziki: Removing 1 file for legal compliance
22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 55s)
22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 28s)
22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
21:59 brennen: end of phabricator update window (T331915)
21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 40s)
21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 28s)
21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915) (duration: 00m 52s)
21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915)
21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
21:13 mutante: phab* - upgrading PHP packages
21:13 mutante: phabricator - maintenance window starting - expect possible downtime
21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915) (duration: 00m 31s)
20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915)
20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
20:48 TheresNoTime: close UTC late backport window
20:48 samtar@deploy2002: Finished scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki (duration: 08m 46s)
20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:39 samtar@deploy2002: Started scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki
20:35 samtar@deploy2002: Finished scap: Backport for Deploy action blocks on itwiki (T330533) (duration: 10m 30s)
20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for Deploy action blocks on itwiki (T330533) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:25 samtar@deploy2002: Started scap: Backport for Deploy action blocks on itwiki (T330533)
20:23 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) (duration: 10m 12s)
20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
20:15 samtar@deploy2002: sgimeno and samtar: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:13 samtar@deploy2002: Started scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)
20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
20:11 taavi: deploy patch for T331192
20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
19:49 taavi@deploy2002: Finished scap: Backport for extdist: Add REL1_40 (T329085) (duration: 12m 04s)
19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
19:39 taavi@deploy2002: taavi: Backport for extdist: Add REL1_40 (T329085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
19:37 taavi@deploy2002: Started scap: Backport for extdist: Add REL1_40 (T329085)
19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. (T332115)
18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27 refs T330205 (duration: 06m 08s)
18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27 refs T330205
18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
18:06 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group1.
18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
16:02 hnowlan: restarted thumbor-instances on thumbor1006
16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:53 claime: Redeploying mw-on-k8s for php7.4 update T330270
14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:41 cgoubert@deploy2002: Started scap: (no justification provided)
14:41 claime: Rebuilding mw-on-k8s images - T330270
14:38 claime: Updating php7.4 production images
14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
14:24 daniel@deploy2002: Finished scap: Backport for Always write parsoid output to parser cache. (T320534) (duration: 09m 57s)
14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
14:22 jbond: switch pki to be active active
14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
14:19 jbond: update pki to use discovery record
14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
14:15 daniel@deploy2002: daniel: Backport for Always write parsoid output to parser cache. (T320534) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
14:14 daniel@deploy2002: Started scap: Backport for Always write parsoid output to parser cache. (T320534)
14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: T321309
14:12 sukhe: depool dns4002 for reimaging to bullseye: T321309
14:00 moritzm: nodejs security updates on buster
13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: T321309
13:49 moritzm: installing graphite-web security updates
13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
13:17 taavi@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) (duration: 09m 01s)
13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
13:08 taavi@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635)
13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
12:18 marostegui: Failover m5 from db1176 to db1106 - T331877
12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T331877
12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T331877
12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
11:20 moritzm: imported packages into thirdparty/ceph-quincy
11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - T331268/25
10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
09:22 moritzm: installing gnutls28 security updates
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T331875', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199 (duration: 00m 19s)
08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199
08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
07:40 tgr_: UTC morning deploys done
07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
07:36 tgr@deploy2002: Finished scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet (duration: 07m 54s)
07:30 tgr@deploy2002: tgr: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:28 tgr@deploy2002: Started scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet
06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) T331874', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
06:20 marostegui: Remove pki2001 from m1 grants T332018

2023-03-14

23:29 brennen@deploy2002: Finished scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205) (duration: 10m 32s)
23:20 brennen@deploy2002: brennen and umherirrender: Backport for action: Restrict action.delete.js to action=delete pages (T330205) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
23:19 brennen@deploy2002: Started scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205)
22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
20:43 ejegg: payments-wiki upgraded from 61c30a4f to 1532b107
20:35 zabe@deploy2002: Finished scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) (duration: 08m 36s)
20:28 zabe@deploy2002: zabe: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:27 zabe@deploy2002: Started scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921)
20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version T327919
19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
19:30 brennen: 1.40.0-wmf.27 train (T330205): uneventful at group0. i'm afk for about an hour.
19:13 ejegg: civicrm upgraded from dbe3b716 to 68fa85cf
18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27 refs T330205
18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
18:03 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group0.
17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
16:47 sukhe: rolling restart of pdns-rec to pick up config changes
16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - T331541
14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
14:00 jbond: reimage pki1001
13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
13:27 TheresNoTime: close UTC afternoon backport window
13:26 samtar@deploy2002: Finished scap: Backport for arwiki: Add new throttle rule (T331973) (duration: 07m 24s)
13:20 samtar@deploy2002: samtar and urbanecm: Backport for arwiki: Add new throttle rule (T331973) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:19 samtar@deploy2002: Started scap: Backport for arwiki: Add new throttle rule (T331973)
13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
13:18 samtar@deploy2002: Finished scap: Backport for Enable VE on more namespaces on foundationwiki (T331079) (duration: 07m 55s)
13:11 samtar@deploy2002: esanders and samtar: Backport for Enable VE on more namespaces on foundationwiki (T331079) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:10 samtar@deploy2002: Started scap: Backport for Enable VE on more namespaces on foundationwiki (T331079)
13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
12:23 moritzm: installing git security updates
12:20 samtar@deploy2002: Finished scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) (duration: 09m 12s)
12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of T297396 + T331680 — scap rolled back
12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: T331541
12:13 samtar@deploy2002: samtar and varnent: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
12:11 samtar@deploy2002: Started scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680)
12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: T331541
12:06 claime: Unlocked scap deployments - T331541
12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: T331541
11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: T331541
11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
11:13 claime: We are encountering unexpected DNS anycast issued following T331541, latencies are increased but no production outage.
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T331541
10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T331541
10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
10:42 jbond: reimage pki-root1001
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
10:32 claime: Repooling all active/active services in eqiad - T331541
10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - T331541
10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
10:15 jayme: enabling puppet on P:calico::kubernetes for T325268
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
10:02 claime: Locking scap deployment for service switchover - T331541
10:00 claime: Locking scap deployment for service switchover - T330651
09:56 jayme: disabling puppet on P:calico::kubernetes for T325268
09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:36 moritzm: installing NSS security updates
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:23 Emperor: reboot ms-be2040 T331860
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 T322294
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet
04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122
04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003`
03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s)
03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27 refs T330205 (duration: 51m 02s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27 refs T330205
02:22 legoktm: removed user's 2FA on wikitech for T331955
02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json
02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json
01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json
01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json
01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json
01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json
01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json
00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json
00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json
00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json
00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json
00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json

2023-03-13

23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json
23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json
23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json
23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json
23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json
23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json
22:48 zabe@deploy2002: Finished scap: noc: Switch default selection on db.php from eqiad to codfw (duration: 06m 56s)
22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json
22:41 zabe@deploy2002: Started scap: noc: Switch default selection on db.php from eqiad to codfw
22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s)
{{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}}
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json
22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json
22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s)
22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json
22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json
21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json
21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json
21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json
21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye
21:21 wfan: remove -d for jobs-dlocal queue runner
21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json
21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json
21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
21:01 ejegg: enabled jobs-dlocal queue runner
21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json
20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye
20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s)
20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein
20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json
20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json
20:27 kindrobot: close UTC late backport window
20:26 kindrobot@deploy2002: Finished scap: Backport for Add header at top of main page (T325362) (duration: 12m 11s)
20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json
20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for Add header at top of main page (T325362) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:15 kindrobot: start UTC late backport window
20:14 kindrobot@deploy2002: Started scap: Backport for Add header at top of main page (T325362)
20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json
20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json
19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json
19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json
19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json
19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json
19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json
18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45796 and previous config saved to /var/cache/conftool/dbconfig/20230313-184502-marostegui.json
18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable (duration: 00m 13s)
18:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable
18:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:36 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date (duration: 00m 14s)
18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
18:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date
18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45795 and previous config saved to /var/cache/conftool/dbconfig/20230313-183628-marostegui.json
18:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45794 and previous config saved to /var/cache/conftool/dbconfig/20230313-182121-marostegui.json
18:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
18:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45793 and previous config saved to /var/cache/conftool/dbconfig/20230313-180615-marostegui.json
17:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
17:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45792 and previous config saved to /var/cache/conftool/dbconfig/20230313-175109-marostegui.json
17:50 dancy@deploy2002: Finished scap: test cleanup (duration: 06m 40s)
17:44 dancy@deploy2002: Started scap: test cleanup
17:43 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45791 and previous config saved to /var/cache/conftool/dbconfig/20230313-174030-marostegui.json
17:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45790 and previous config saved to /var/cache/conftool/dbconfig/20230313-174009-marostegui.json
17:35 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45789 and previous config saved to /var/cache/conftool/dbconfig/20230313-172503-marostegui.json
17:22 dancy@deploy2002: Finished scap: testing T329857 (duration: 06m 54s)
17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
17:15 dancy@deploy2002: Started scap: testing T329857
17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:12 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
17:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:11 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
17:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:10 Emperor: roll-restart of codfw eqiad frontends
17:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45788 and previous config saved to /var/cache/conftool/dbconfig/20230313-170955-marostegui.json
17:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:08 dancy@deploy2002: Installation of scap version "4.46.0" completed for 553 hosts
17:07 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
17:04 bd808: Ran cache.purge_openstack_users() for Striker following deploy of e1f7491 (T331674)
17:04 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45787 and previous config saved to /var/cache/conftool/dbconfig/20230313-165449-marostegui.json
16:47 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45785 and previous config saved to /var/cache/conftool/dbconfig/20230313-164410-marostegui.json
16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45784 and previous config saved to /var/cache/conftool/dbconfig/20230313-164349-marostegui.json
16:36 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45783 and previous config saved to /var/cache/conftool/dbconfig/20230313-162843-marostegui.json
16:20 moritzm: imported tideways 5.0.4-2+wmf1+buster1+icu67u1 T329491
16:18 dancy@deploy2002: Finished scap: testing (duration: 06m 53s)
16:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
16:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
16:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
16:16 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
16:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
16:16 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45782 and previous config saved to /var/cache/conftool/dbconfig/20230313-161337-marostegui.json
16:11 dancy@deploy2002: Started scap: testing
16:06 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 15s)
16:00 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+buster1+icu67u1 T329491
16:00 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 43s)
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45781 and previous config saved to /var/cache/conftool/dbconfig/20230313-155830-marostegui.json
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45780 and previous config saved to /var/cache/conftool/dbconfig/20230313-154641-marostegui.json
15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
15:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
15:35 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1+icu67u1 T329491
15:31 dancy@deploy2002: Finished scap: testing T329857 (duration: 10m 08s)
15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
15:21 dancy@deploy2002: Started scap: testing T329857
15:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
15:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45779 and previous config saved to /var/cache/conftool/dbconfig/20230313-150523-marostegui.json
15:03 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
14:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
14:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
14:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
14:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45778 and previous config saved to /var/cache/conftool/dbconfig/20230313-145016-marostegui.json
14:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
14:38 jbond: disable puppet fleet wide to debug strange issue
14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45777 and previous config saved to /var/cache/conftool/dbconfig/20230313-143510-marostegui.json
14:23 claime: switch noc.wikimedia.org from eqiad to codfw - T331634
14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45776 and previous config saved to /var/cache/conftool/dbconfig/20230313-142004-marostegui.json
14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45774 and previous config saved to /var/cache/conftool/dbconfig/20230313-141409-marostegui.json
14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45773 and previous config saved to /var/cache/conftool/dbconfig/20230313-141348-marostegui.json
14:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45772 and previous config saved to /var/cache/conftool/dbconfig/20230313-135842-marostegui.json
13:50 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
13:49 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
13:48 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
13:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6] (duration: 00m 11s)
13:48 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6]
13:47 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
13:46 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
13:45 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45770 and previous config saved to /var/cache/conftool/dbconfig/20230313-134336-marostegui.json
13:40 moritzm: imported wikidiff2 1.13.0-1+wmf1+buster1+icu67u1 T329491
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45769 and previous config saved to /var/cache/conftool/dbconfig/20230313-132829-marostegui.json
13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1 T329491
13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1T329491
13:23 taavi@deploy2002: Finished scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) (duration: 08m 10s)
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45768 and previous config saved to /var/cache/conftool/dbconfig/20230313-132123-marostegui.json
13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45767 and previous config saved to /var/cache/conftool/dbconfig/20230313-132101-marostegui.json
13:16 taavi@deploy2002: taavi and superpes: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:15 taavi@deploy2002: Started scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047)
13:13 taavi@deploy2002: Finished scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) (duration: 09m 29s)
13:11 moritzm: imported php-luasandbox 4.0.2-3+wmf1+buster1+icu67u1 T329491
13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45766 and previous config saved to /var/cache/conftool/dbconfig/20230313-130555-marostegui.json
13:05 taavi@deploy2002: stang and taavi: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:03 taavi@deploy2002: Started scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691)
13:00 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+buster1+icu67u1 T329491
12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45764 and previous config saved to /var/cache/conftool/dbconfig/20230313-125049-marostegui.json
12:48 hnowlan: restarting codfw thumbor instances to attempt to remedy 502 issues
12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.codfw.wmnet
12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.codfw.wmnet
12:37 moritzm: imported php-geoip 1.1.1-7+wmf2+buster1+icu67u1 T329491
12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45763 and previous config saved to /var/cache/conftool/dbconfig/20230313-123543-marostegui.json
12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45762 and previous config saved to /var/cache/conftool/dbconfig/20230313-122928-marostegui.json
12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45761 and previous config saved to /var/cache/conftool/dbconfig/20230313-122906-marostegui.json
12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:19 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf1+buster1+icu67u1 T329491
12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45760 and previous config saved to /var/cache/conftool/dbconfig/20230313-121400-marostegui.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45759 and previous config saved to /var/cache/conftool/dbconfig/20230313-115854-marostegui.json
11:58 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+buster1+icu67u1 T329491
11:46 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+buster1+icu67u1 T329491
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45758 and previous config saved to /var/cache/conftool/dbconfig/20230313-114348-marostegui.json
11:31 moritzm: imported php-apcu 5.1.19+4.0.11-3+wmf2+buster1+icu67u1 T329491
11:22 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
11:21 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
11:11 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+buster1+icu67u1 T329491
10:55 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+buster1+icu67u1 T329491
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45757 and previous config saved to /var/cache/conftool/dbconfig/20230313-104322-marostegui.json
10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45756 and previous config saved to /var/cache/conftool/dbconfig/20230313-104246-marostegui.json
10:38 moritzm: imported php-pcov 1.0.6-4+wmf1~buster1+icu67u1 T329491
10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45755 and previous config saved to /var/cache/conftool/dbconfig/20230313-102740-marostegui.json
10:26 moritzm: imported php-defaults 7.4+76+wmf1~buster2+icu67u1 T329491
10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55701
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45754 and previous config saved to /var/cache/conftool/dbconfig/20230313-101234-marostegui.json
10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55701
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38193
10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38193
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46632
10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46632
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6663
10:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6663
10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45558
10:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45558
10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
10:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
10:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 668
10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 668
10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
10:02 moritzm: imported dh-php 0.35+wmf1+buster1+icu67u1 T329491
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45753 and previous config saved to /var/cache/conftool/dbconfig/20230313-095728-marostegui.json
09:55 vgutierrez: Enable haproxy hardening in cp hosts globally - T323944
09:52 zabe@deploy2002: Finished scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] (duration: 07m 40s)
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45752 and previous config saved to /var/cache/conftool/dbconfig/20230313-095119-marostegui.json
09:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
09:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45751 and previous config saved to /var/cache/conftool/dbconfig/20230313-095058-marostegui.json
09:48 jayme: pcc-worker1003:~# rm -r /srv/jenkins/puppet-compiler/40076 - / back to 70%
09:46 zabe@deploy2002: jforrester and zabe: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
09:45 jayme: pcc-worker1002:~# rm -r /srv/jenkins/puppet-compiler/40078 - / back to 47% usage
09:44 zabe@deploy2002: Started scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply]
09:44 zabe@deploy2002: Finished scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) (duration: 07m 52s)
09:40 jayme: pcc-worker1001:~# rm -r /srv/jenkins/puppet-compiler/40079 /srv/jenkins/puppet-compiler/38943 - / back to 68% usage
09:38 zabe@deploy2002: zabe: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:36 zabe@deploy2002: Started scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45750 and previous config saved to /var/cache/conftool/dbconfig/20230313-093552-marostegui.json
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45749 and previous config saved to /var/cache/conftool/dbconfig/20230313-092045-marostegui.json
09:16 moritzm: installing python-werkzeug security updates
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45748 and previous config saved to /var/cache/conftool/dbconfig/20230313-090539-marostegui.json
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45747 and previous config saved to /var/cache/conftool/dbconfig/20230313-085937-marostegui.json
08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45746 and previous config saved to /var/cache/conftool/dbconfig/20230313-085916-marostegui.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45745 and previous config saved to /var/cache/conftool/dbconfig/20230313-084409-marostegui.json
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45744 and previous config saved to /var/cache/conftool/dbconfig/20230313-082903-marostegui.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45743 and previous config saved to /var/cache/conftool/dbconfig/20230313-081357-marostegui.json
08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45742 and previous config saved to /var/cache/conftool/dbconfig/20230313-080759-marostegui.json
08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45741 and previous config saved to /var/cache/conftool/dbconfig/20230313-080738-marostegui.json
08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:02 moritzm: installing curl security updates
07:58 zabe@deploy2002: Finished scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) (duration: 07m 02s)
07:53 zabe@deploy2002: zabe: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45740 and previous config saved to /var/cache/conftool/dbconfig/20230313-075232-marostegui.json
07:51 zabe@deploy2002: Started scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482)
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45739 and previous config saved to /var/cache/conftool/dbconfig/20230313-073725-marostegui.json
07:37 marostegui: Remove pagetriage_log from enwiki T328309
07:32 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) (duration: 17m 04s)
07:25 kartik@deploy2002: kartik: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45738 and previous config saved to /var/cache/conftool/dbconfig/20230313-072219-marostegui.json
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45737 and previous config saved to /var/cache/conftool/dbconfig/20230313-071522-marostegui.json
07:15 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)
07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45736 and previous config saved to /var/cache/conftool/dbconfig/20230313-071501-marostegui.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45735 and previous config saved to /var/cache/conftool/dbconfig/20230313-065954-marostegui.json
06:52 marostegui_: Remove pagetriage_log from testwiki and test2wiki T328309
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45734 and previous config saved to /var/cache/conftool/dbconfig/20230313-064448-marostegui.json
06:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9873
06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9873
06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9507
06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9507
06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15830
06:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15830
06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9902
06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9902
06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45733 and previous config saved to /var/cache/conftool/dbconfig/20230313-062942-marostegui.json
06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34549
06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34549
06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 29357
06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45732 and previous config saved to /var/cache/conftool/dbconfig/20230313-062244-marostegui.json
06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138886
06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138886
06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
06:16 marostegui_: Deploy schema change on s3 codfw dbmaint T329684
06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
04:37 kart_: Updated cxserver to 2023-03-09-061555-production (T331097, T327102, T326541)
04:19 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
04:19 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
04:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
04:17 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
04:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
04:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-03-12

10:47 elukey: reset offsets on kafka jumbo for benthos webrequest live (as indicated in https://phabricator.wikimedia.org/T331801#8685569)
07:50 elukey: restart benthos-webrequest-live on centrallog1002 - T331801
07:49 elukey: restart benthos-webrequest-live on centrallog2002 - T331801
07:49 elukey: stop and mask benthos-webrequest-live on centrallog1001 - T331801

2023-03-10

22:43 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
22:32 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
22:26 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
22:16 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
21:24 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:14 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
21:13 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:03 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
20:43 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78] (duration: 00m 10s)
20:43 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78]
20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
19:39 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 00m 09s)
19:38 milimetric@deploy2002: Started deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942]
19:38 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 08m 08s)
19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
19:30 milimetric@deploy2002: Started deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942]
19:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
19:17 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
19:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
19:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
18:55 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944] (duration: 00m 12s)
18:55 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944]
18:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
18:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
18:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
18:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
18:12 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
18:04 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:53 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:52 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:40 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:28 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
17:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
16:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
16:42 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
16:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
16:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
16:04 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2003-dev']
16:04 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
15:59 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:59 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:57 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
15:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2003-dev']
15:53 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
15:53 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
15:50 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
15:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2002-dev']
15:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
15:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2002-dev']
15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:08 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
14:52 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
14:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:47 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:38 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
14:36 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
14:22 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
14:20 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
14:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
14:08 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
13:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
13:34 Emperor: restart swift-object-replicator on ms-be2067
13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
12:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
12:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
12:46 moritzm: installing libsdl2 security updates
12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
12:31 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
12:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
12:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:15 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
11:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader1004.wikimedia.org with OS bullseye
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
11:35 moritzm: instaling isc-dhcp bugfix updates from DLA 3326
11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1004.wikimedia.org with OS bullseye
11:04 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=jawiki --logwiki=metawiki --ignorestatus 'あーあーあーあーあー' 'ARIAUSO' # T331685
11:03 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ZSTK Lublin' 'Sonabet4' # T331685
11:01 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Yair.herman' 'Manor258' # T331685
10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Tranquill Komnin' 'Nevechear' # T331685
10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Tosikuni Japan' 'Revisionist14' # T331685
10:54 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Studio 7 Piaseczno Jarosław Zawadzki' 'Jarosław Andrzej Zawadzki (muzyk)' # T331685
10:52 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Siniy7' 'Viktorbublik' # T331685
10:51 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki --ignorestatus 'Reza amjad(iran)' 'رضا امجد (تبریز)' # T331685
10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Mac700' 'Unknown001100' # T331685
10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'HonzaSTECH' 'ShadyMedic' # T331685
10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mac700' 'Unknown001100' # T331685
10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'HonzaSTECH' 'ShadyMedic' # T331685
10:40 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
09:57 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
09:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
02:09 zabe@deploy2002: Finished scap: T331685 (duration: 07m 52s)
02:02 zabe@deploy2002: Started scap: T331685
02:01 zabe@deploy2002: Finished scap: T331685 (duration: 07m 28s)
02:00 ejegg: SmashPig upgraded from c6775c60 to 3b84e4cb
01:55 ejegg: payments-wiki upgraded from 05a5e09a to 61c30a4f
01:54 zabe@deploy2002: Started scap: T331685
01:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye

2023-03-09

23:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting (duration: 00m 14s)
23:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting
23:33 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor (duration: 00m 14s)
23:32 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor
23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
23:01 sukhe: pool new dns hosts dns1003 and dns2003: T330670
22:53 sukhe: run homer in cr*-{codfw,eqiad} for CR 896190
22:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2003.wikimedia.org with OS bullseye
22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
22:41 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
22:40 bd808: Forced puppet run on cloudweb100[34] to apply quick fix for T331674
22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
22:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1003.wikimedia.org with OS bullseye
22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
22:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
22:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
22:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
22:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
22:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
22:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns2003.wikimedia.org with OS bullseye
21:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
21:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
21:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
21:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
21:38 TheresNoTime: close UTC late backport
21:37 samtar@deploy2002: Finished scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) (duration: 10m 43s)
21:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
21:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
21:28 samtar@deploy2002: samtar and nray: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:27 samtar@deploy2002: Started scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)
21:24 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part II of II (duration: 07m 38s)
21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
21:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
21:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
21:18 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part II of II synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:17 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part II of II
21:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:14 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part I of II (duration: 12m 19s)
21:10 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns2003
21:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
21:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns2003
21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
21:08 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
21:07 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
21:03 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part I of II synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.mgmt.codfw.wmnet on all recursors
21:02 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.mgmt.codfw.wmnet on all recursors
21:02 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part I of II
20:59 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.wikimedia.org on all recursors
20:59 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.wikimedia.org on all recursors
20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
20:46 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
20:44 sukhe@cumin2002: START - Cookbook sre.dns.netbox
20:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1003.wikimedia.org']
20:30 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1003.wikimedia.org']
20:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
20:24 topranks: move cloud-hosts1-b-codfw GW from core routers to cloudsw1-b1-codfw T327919
20:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
20:12 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns1003.wikimedia.org on all recursors
20:12 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns1003.wikimedia.org on all recursors
20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
20:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
20:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
19:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
19:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
19:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
19:15 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
19:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.26 refs T330204
19:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
18:53 sukhe: enable puppet on A:dns-rec and force puppet run: T330670
18:50 mforns@deploy2002: Finished deploy [airflow-dags/analytics@3419b7d]: (no justification provided) (duration: 00m 10s)
18:50 mforns@deploy2002: Started deploy [airflow-dags/analytics@3419b7d]: (no justification provided)
18:47 sukhe: enable puppet on dns4003 to merge 895894
18:44 sukhe: disable puppet on A:dns-rec to merge CR 895894
18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
18:34 sukhe: [correction] homer "cr*-codfw*" commit "Remove authdns2001 from homer, T330670"
18:34 sukhe: homer "cr*-codfw*" commit "Remove authdns1001 from homer, T330670"
18:31 sukhe: homer "cr*-eqiad*" commit "Remove authdns1001 from homer, T330670"
18:26 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts authdns[1001,2001].wikimedia.org
18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
18:24 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
18:22 sukhe: running puppet-agent on A:dns-auth to remove deprecated authdns[12]001
18:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
18:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:15 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts authdns[1001,2001].wikimedia.org
18:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
18:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
18:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:00 sukhe: cr*-codfw [ns0]: set routing-options static route 208.80.154.238/32 next-hop 208.80.153.77: T330670
17:53 sukhe: cr*-codfw [ns1]: set routing-options static route 208.80.153.231/32 next-hop 208.80.153.77: T330670
17:50 zabe@deploy2002: Finished scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) (duration: 11m 57s)
17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45725 and previous config saved to /var/cache/conftool/dbconfig/20230309-174723-marostegui.json
17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:42 sukhe: [ns1] set routing-options static route 208.80.153.231/32 next-hop 208.80.154.10: T330670
17:39 zabe@deploy2002: zabe and ssastry: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
17:38 zabe@deploy2002: Started scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629)
17:37 sukhe: cr2-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
17:37 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
17:36 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10
17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45724 and previous config saved to /var/cache/conftool/dbconfig/20230309-173217-marostegui.json
17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45723 and previous config saved to /var/cache/conftool/dbconfig/20230309-171711-marostegui.json
17:13 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (prod links) T327919
17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45722 and previous config saved to /var/cache/conftool/dbconfig/20230309-170205-marostegui.json
16:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45721 and previous config saved to /var/cache/conftool/dbconfig/20230309-165210-marostegui.json
16:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45720 and previous config saved to /var/cache/conftool/dbconfig/20230309-165149-marostegui.json
16:51 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (cloud vrf) T327919
16:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45719 and previous config saved to /var/cache/conftool/dbconfig/20230309-163643-marostegui.json
16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45718 and previous config saved to /var/cache/conftool/dbconfig/20230309-162608-root.json
16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45717 and previous config saved to /var/cache/conftool/dbconfig/20230309-162137-marostegui.json
16:18 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief1001.eqiad.wmnet with OS bullseye
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45716 and previous config saved to /var/cache/conftool/dbconfig/20230309-161103-root.json
16:09 zabe@deploy2002: Finished scap: T308932 (duration: 07m 19s)
16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45715 and previous config saved to /var/cache/conftool/dbconfig/20230309-160630-marostegui.json
16:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
16:03 marostegui: Restart mailman service T331626
16:02 zabe@deploy2002: Started scap: T308932
16:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
16:00 marostegui: Failover m5 from db1183 to db1176 - T330847
15:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45714 and previous config saved to /var/cache/conftool/dbconfig/20230309-155558-root.json
15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45713 and previous config saved to /var/cache/conftool/dbconfig/20230309-155520-marostegui.json
15:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45712 and previous config saved to /var/cache/conftool/dbconfig/20230309-155459-marostegui.json
15:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45711 and previous config saved to /var/cache/conftool/dbconfig/20230309-154053-root.json
15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45710 and previous config saved to /var/cache/conftool/dbconfig/20230309-153953-marostegui.json
15:29 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief1001.eqiad.wmnet with OS bullseye
15:27 brett: Enable puppet on R:acme_chief::cert - T321309
15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45709 and previous config saved to /var/cache/conftool/dbconfig/20230309-152447-marostegui.json
15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
15:15 moritzm: installing PHP 7.3 security updates (as shipped in Debian)
15:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
15:14 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:13 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:11 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45707 and previous config saved to /var/cache/conftool/dbconfig/20230309-151100-marostegui.json
15:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
15:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
15:10 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:10 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45706 and previous config saved to /var/cache/conftool/dbconfig/20230309-150940-marostegui.json
15:06 brett: Disable puppet on R:acme_chief::cert for acmechief maintenance - T321309
15:04 zabe@deploy2002: Finished scap: Backport for Drop unused FlaggedRevs threshold level names (T277883) (duration: 10m 48s)
15:04 TheresNoTime: close UTC afternoon backport window
15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
15:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
15:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:56 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:55 zabe@deploy2002: awight and zabe: Backport for Drop unused FlaggedRevs threshold level names (T277883) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:55 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:54 zabe@deploy2002: Started scap: Backport for Drop unused FlaggedRevs threshold level names (T277883)
14:34 moritzm: installing apr security updates
14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
14:30 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f774711]: (no justification provided) (duration: 19m 03s)
14:13 samtar@deploy2002: Finished scap: Backport for Bump parsoid parser cache writes to 50%. (T320534) (duration: 07m 28s)
14:11 jgiannelos@deploy2002: Started deploy [restbase/deploy@f774711]: (no justification provided)
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45705 and previous config saved to /var/cache/conftool/dbconfig/20230309-140915-marostegui.json
14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45704 and previous config saved to /var/cache/conftool/dbconfig/20230309-140850-marostegui.json
14:08 Emperor: testing disk-swap in ms-be1066 T329305
14:07 samtar@deploy2002: daniel and samtar: Backport for Bump parsoid parser cache writes to 50%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:05 samtar@deploy2002: Started scap: Backport for Bump parsoid parser cache writes to 50%. (T320534)
14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45703 and previous config saved to /var/cache/conftool/dbconfig/20230309-140510-marostegui.json
14:00 aqu@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b] (duration: 00m 13s)
14:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b]
13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45702 and previous config saved to /var/cache/conftool/dbconfig/20230309-135343-marostegui.json
13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45701 and previous config saved to /var/cache/conftool/dbconfig/20230309-135004-marostegui.json
13:42 moritzm: restarting FPM/Apache on mw canaries to pick up curl updates
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45700 and previous config saved to /var/cache/conftool/dbconfig/20230309-133837-marostegui.json
13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45699 and previous config saved to /var/cache/conftool/dbconfig/20230309-133458-marostegui.json
13:34 moritzm: installing curl security updates
13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45698 and previous config saved to /var/cache/conftool/dbconfig/20230309-132331-marostegui.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45697 and previous config saved to /var/cache/conftool/dbconfig/20230309-131951-marostegui.json
13:17 vgutierrez: rolling restart of pybal in lvs2009 and lvs2010
13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45696 and previous config saved to /var/cache/conftool/dbconfig/20230309-131136-marostegui.json
13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
13:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
13:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:03 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45695 and previous config saved to /var/cache/conftool/dbconfig/20230309-130315-marostegui.json
12:57 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=aqs,dc=codfw
12:55 btullis@puppetmaster1001: conftool action : set/weight=10; selector: cluster=aqs,dc=codfw
12:53 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs2001.codfw.wmnet
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45694 and previous config saved to /var/cache/conftool/dbconfig/20230309-124809-marostegui.json
12:46 btullis@cumin1001: START - Cookbook sre.dns.netbox
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45693 and previous config saved to /var/cache/conftool/dbconfig/20230309-124025-marostegui.json
12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45692 and previous config saved to /var/cache/conftool/dbconfig/20230309-124004-marostegui.json
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45691 and previous config saved to /var/cache/conftool/dbconfig/20230309-123303-marostegui.json
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45690 and previous config saved to /var/cache/conftool/dbconfig/20230309-123015-root.json
12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45689 and previous config saved to /var/cache/conftool/dbconfig/20230309-122458-marostegui.json
12:22 moritzm: rebalancing ganeti eqiad/C after completion of bullseye updates T311687
12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45688 and previous config saved to /var/cache/conftool/dbconfig/20230309-121756-marostegui.json
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45687 and previous config saved to /var/cache/conftool/dbconfig/20230309-121510-root.json
12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45686 and previous config saved to /var/cache/conftool/dbconfig/20230309-120951-marostegui.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45685 and previous config saved to /var/cache/conftool/dbconfig/20230309-120559-marostegui.json
12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45684 and previous config saved to /var/cache/conftool/dbconfig/20230309-120537-marostegui.json
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45683 and previous config saved to /var/cache/conftool/dbconfig/20230309-120005-root.json
11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45682 and previous config saved to /var/cache/conftool/dbconfig/20230309-115445-marostegui.json
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45681 and previous config saved to /var/cache/conftool/dbconfig/20230309-115031-marostegui.json
11:47 marostegui: Deploy schema change on s1 codfw dbmaint T329684
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45680 and previous config saved to /var/cache/conftool/dbconfig/20230309-114500-root.json
11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45679 and previous config saved to /var/cache/conftool/dbconfig/20230309-114338-marostegui.json
11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
11:40 moritzm: installing git security updates
11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45678 and previous config saved to /var/cache/conftool/dbconfig/20230309-113525-marostegui.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45677 and previous config saved to /var/cache/conftool/dbconfig/20230309-112804-marostegui.json
11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45676 and previous config saved to /var/cache/conftool/dbconfig/20230309-112739-marostegui.json
11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45675 and previous config saved to /var/cache/conftool/dbconfig/20230309-112019-marostegui.json
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45674 and previous config saved to /var/cache/conftool/dbconfig/20230309-111233-marostegui.json
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45673 and previous config saved to /var/cache/conftool/dbconfig/20230309-110827-marostegui.json
11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45672 and previous config saved to /var/cache/conftool/dbconfig/20230309-110806-marostegui.json
11:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
11:01 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
11:00 otto@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Step 2b: InitialiseSettings.php - remove duplicate configs - T308932 (duration: 06m 37s)
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45671 and previous config saved to /var/cache/conftool/dbconfig/20230309-105726-marostegui.json
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45670 and previous config saved to /var/cache/conftool/dbconfig/20230309-105259-marostegui.json
10:50 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Step 2a: ext-EventLogging.php - remove duplicate configs - T308932 (duration: 06m 32s)
10:47 topranks: Resetting PIC in slot 1/0 on cr2-codfw T331527
10:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
10:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45669 and previous config saved to /var/cache/conftool/dbconfig/20230309-104220-marostegui.json
10:39 otto@deploy2002: Synchronized multiversion/MWConfigCacheGenerator.php: Step 1b: MWConfigCacheGenerator.php - load ext-EventStreamConfig.php - T308932 (duration: 06m 23s)
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45668 and previous config saved to /var/cache/conftool/dbconfig/20230309-103753-marostegui.json
10:32 hashar@deploy2002: Finished deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others (duration: 00m 08s)
10:32 hashar@deploy2002: Started deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others
10:29 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Step 1a: ext-EventStreamConfig.php - wgEventStreams lives here - T308932 (duration: 06m 43s)
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
10:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:23 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45667 and previous config saved to /var/cache/conftool/dbconfig/20230309-102247-marostegui.json
10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:22 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
10:22 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:22 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
10:21 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:21 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
10:19 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:19 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:12 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45666 and previous config saved to /var/cache/conftool/dbconfig/20230309-101042-marostegui.json
10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
10:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45665 and previous config saved to /var/cache/conftool/dbconfig/20230309-101020-marostegui.json
10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45664 and previous config saved to /var/cache/conftool/dbconfig/20230309-100611-marostegui.json
10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:01 topranks: commencing work to drain cr2-codfw ports on card 1/0 (T331601)
09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
09:55 marostegui: Deploy schema change on s4 codfw dbmaint T329684
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45663 and previous config saved to /var/cache/conftool/dbconfig/20230309-095514-marostegui.json
09:53 marostegui: Deploy schema change on s8 codfw dbmaint T329684
09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
09:48 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45662 and previous config saved to /var/cache/conftool/dbconfig/20230309-094602-root.json
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45661 and previous config saved to /var/cache/conftool/dbconfig/20230309-094008-marostegui.json
09:33 topranks: resetting Pic 1/0 on cr1-codfw
09:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
09:32 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45660 and previous config saved to /var/cache/conftool/dbconfig/20230309-093120-marostegui.json
09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45659 and previous config saved to /var/cache/conftool/dbconfig/20230309-093057-root.json
09:29 elukey: delete old/unused ML-related docker images from the registry - T331513
09:27 topranks: disabling Transit cct on cr1-codfw xe-1/0/1:0 (T331527)
09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45658 and previous config saved to /var/cache/conftool/dbconfig/20230309-092502-marostegui.json
09:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1011.eqiad.wmnet with OS bullseye
09:21 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
09:20 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
09:19 marostegui: Deploy schema change on s7 codfw dbmaint T329684
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45657 and previous config saved to /var/cache/conftool/dbconfig/20230309-091613-marostegui.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45656 and previous config saved to /var/cache/conftool/dbconfig/20230309-091552-root.json
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45655 and previous config saved to /var/cache/conftool/dbconfig/20230309-091400-marostegui.json
09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:13 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45654 and previous config saved to /var/cache/conftool/dbconfig/20230309-091338-marostegui.json
09:13 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
09:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
09:12 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45653 and previous config saved to /var/cache/conftool/dbconfig/20230309-090107-marostegui.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45652 and previous config saved to /var/cache/conftool/dbconfig/20230309-090048-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45651 and previous config saved to /var/cache/conftool/dbconfig/20230309-085832-marostegui.json
08:54 marostegui: Deploy schema change on s2 codfw dbmaint T329684
08:54 marostegui: Deploy schema change on s5 codfw dbmaint T329684
08:54 marostegui: Deploy schema change on s6 codfw dbmaint T329684
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS bullseye
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45650 and previous config saved to /var/cache/conftool/dbconfig/20230309-084601-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45649 and previous config saved to /var/cache/conftool/dbconfig/20230309-084543-root.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45648 and previous config saved to /var/cache/conftool/dbconfig/20230309-084359-marostegui.json
08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45647 and previous config saved to /var/cache/conftool/dbconfig/20230309-084326-marostegui.json
08:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:39 taavi@deploy2002: Finished scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) (duration: 11m 37s)
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45646 and previous config saved to /var/cache/conftool/dbconfig/20230309-083802-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45645 and previous config saved to /var/cache/conftool/dbconfig/20230309-083604-root.json
08:33 moritzm: remove ganeti1011 for eventual reimage T311687
08:30 taavi@deploy2002: taavi and kharlan: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45644 and previous config saved to /var/cache/conftool/dbconfig/20230309-082820-marostegui.json
08:28 taavi@deploy2002: Started scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264)
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
08:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45643 and previous config saved to /var/cache/conftool/dbconfig/20230309-082257-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45642 and previous config saved to /var/cache/conftool/dbconfig/20230309-082059-root.json
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45641 and previous config saved to /var/cache/conftool/dbconfig/20230309-081707-marostegui.json
08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45640 and previous config saved to /var/cache/conftool/dbconfig/20230309-081646-marostegui.json
08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45639 and previous config saved to /var/cache/conftool/dbconfig/20230309-080858-marostegui.json
08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45638 and previous config saved to /var/cache/conftool/dbconfig/20230309-080837-marostegui.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45637 and previous config saved to /var/cache/conftool/dbconfig/20230309-080752-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45636 and previous config saved to /var/cache/conftool/dbconfig/20230309-080555-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45635 and previous config saved to /var/cache/conftool/dbconfig/20230309-080140-marostegui.json
07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45634 and previous config saved to /var/cache/conftool/dbconfig/20230309-075331-marostegui.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45633 and previous config saved to /var/cache/conftool/dbconfig/20230309-075247-root.json
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45632 and previous config saved to /var/cache/conftool/dbconfig/20230309-075050-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45631 and previous config saved to /var/cache/conftool/dbconfig/20230309-074633-marostegui.json
07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45630 and previous config saved to /var/cache/conftool/dbconfig/20230309-073825-marostegui.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45629 and previous config saved to /var/cache/conftool/dbconfig/20230309-073743-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45628 and previous config saved to /var/cache/conftool/dbconfig/20230309-073545-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45627 and previous config saved to /var/cache/conftool/dbconfig/20230309-073127-marostegui.json
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45626 and previous config saved to /var/cache/conftool/dbconfig/20230309-072319-marostegui.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45625 and previous config saved to /var/cache/conftool/dbconfig/20230309-072238-root.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45624 and previous config saved to /var/cache/conftool/dbconfig/20230309-072040-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329684)', diff saved to https://phabricator.wikimedia.org/P45623 and previous config saved to /var/cache/conftool/dbconfig/20230309-071853-marostegui.json
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45622 and previous config saved to /var/cache/conftool/dbconfig/20230309-071809-marostegui.json
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
07:15 marostegui: Deploy schema change on s3 eqiad dbmaint T329684
07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
07:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
07:13 marostegui: Deploy schema change on s7 eqiad dbmaint T329684
07:13 marostegui: Deploy schema change on s8 eqiad dbmaint T329684
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P45621 and previous config saved to /var/cache/conftool/dbconfig/20230309-071029-root.json
07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45620 and previous config saved to /var/cache/conftool/dbconfig/20230309-070805-marostegui.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45619 and previous config saved to /var/cache/conftool/dbconfig/20230309-070733-root.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45618 and previous config saved to /var/cache/conftool/dbconfig/20230309-070658-marostegui.json
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329684)', diff saved to https://phabricator.wikimedia.org/P45617 and previous config saved to /var/cache/conftool/dbconfig/20230309-070327-marostegui.json
07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45616 and previous config saved to /var/cache/conftool/dbconfig/20230309-070223-marostegui.json
07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
06:48 marostegui: Deploy schema change on s1 eqiad dbmaint T329684
06:48 marostegui: Deploy schema change on s4 eqiad dbmaint T329684
06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45615 and previous config saved to /var/cache/conftool/dbconfig/20230309-064538-marostegui.json
06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
06:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
06:43 marostegui: Deploy schema change on s2 eqiad dbmaint T329684
06:42 marostegui: Deploy schema change on s5 eqiad dbmaint T329684
06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
06:40 marostegui: Deploy schema change on s6 eqiad dbmaint T329684
06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
04:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45614 and previous config saved to /var/cache/conftool/dbconfig/20230309-040925-marostegui.json
03:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45613 and previous config saved to /var/cache/conftool/dbconfig/20230309-035418-marostegui.json
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45612 and previous config saved to /var/cache/conftool/dbconfig/20230309-033912-marostegui.json
03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45611 and previous config saved to /var/cache/conftool/dbconfig/20230309-032406-marostegui.json
03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45610 and previous config saved to /var/cache/conftool/dbconfig/20230309-030445-marostegui.json
03:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
03:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45609 and previous config saved to /var/cache/conftool/dbconfig/20230309-030424-marostegui.json
02:59 sukhe: run keyholder arm on acmechief2001
02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45608 and previous config saved to /var/cache/conftool/dbconfig/20230309-024917-marostegui.json
02:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45607 and previous config saved to /var/cache/conftool/dbconfig/20230309-023411-marostegui.json
02:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45606 and previous config saved to /var/cache/conftool/dbconfig/20230309-021905-marostegui.json
01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45604 and previous config saved to /var/cache/conftool/dbconfig/20230309-015831-marostegui.json
01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45603 and previous config saved to /var/cache/conftool/dbconfig/20230309-015810-marostegui.json
01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45602 and previous config saved to /var/cache/conftool/dbconfig/20230309-014303-marostegui.json
01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45601 and previous config saved to /var/cache/conftool/dbconfig/20230309-012757-marostegui.json
01:18 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors (duration: 00m 13s)
01:18 ebernhardson@deploy2002: Started deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors
01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45600 and previous config saved to /var/cache/conftool/dbconfig/20230309-011251-marostegui.json
00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45599 and previous config saved to /var/cache/conftool/dbconfig/20230309-005220-marostegui.json
00:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
00:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45598 and previous config saved to /var/cache/conftool/dbconfig/20230309-005210-marostegui.json
00:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45597 and previous config saved to /var/cache/conftool/dbconfig/20230309-003703-marostegui.json
00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45596 and previous config saved to /var/cache/conftool/dbconfig/20230309-002157-marostegui.json
00:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45594 and previous config saved to /var/cache/conftool/dbconfig/20230309-000651-marostegui.json

2023-03-08

23:50 zabe@deploy2002: Finished scap: T308932 (duration: 07m 15s)
23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45593 and previous config saved to /var/cache/conftool/dbconfig/20230308-234534-marostegui.json
23:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
23:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45592 and previous config saved to /var/cache/conftool/dbconfig/20230308-234502-marostegui.json
23:42 zabe@deploy2002: Started scap: T308932
23:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths (duration: 00m 14s)
23:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths
23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45591 and previous config saved to /var/cache/conftool/dbconfig/20230308-232956-marostegui.json
23:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45590 and previous config saved to /var/cache/conftool/dbconfig/20230308-231449-marostegui.json
22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45589 and previous config saved to /var/cache/conftool/dbconfig/20230308-225943-marostegui.json
22:44 hashar: Upgrading CI Jenkins
22:42 tgr: UTC late deploys done
22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45588 and previous config saved to /var/cache/conftool/dbconfig/20230308-224044-marostegui.json
22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45587 and previous config saved to /var/cache/conftool/dbconfig/20230308-224018-marostegui.json
22:39 tgr@deploy2002: Finished scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) (duration: 08m 31s)
22:32 tgr@deploy2002: tgr: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:30 tgr@deploy2002: Started scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524)
22:29 tgr@deploy2002: Finished scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) (duration: 07m 43s)
22:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
22:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45586 and previous config saved to /var/cache/conftool/dbconfig/20230308-222512-marostegui.json
22:23 tgr@deploy2002: tgr: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:21 tgr@deploy2002: Started scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412)
22:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
22:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
22:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45585 and previous config saved to /var/cache/conftool/dbconfig/20230308-221006-marostegui.json
22:09 kindrobot: hand off backport window UTC late to tgr for self-service
22:07 kindrobot@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) (duration: 09m 36s)
21:59 kindrobot@deploy2002: sbailey and kindrobot: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:57 kindrobot@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612)
21:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45584 and previous config saved to /var/cache/conftool/dbconfig/20230308-215500-marostegui.json
21:54 kindrobot@deploy2002: Finished scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) (duration: 07m 49s)
21:48 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.cod
21:46 kindrobot@deploy2002: Started scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588)
21:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
21:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
21:30 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqi
21:29 kindrobot@deploy2002: Started scap: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444)
21:22 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix (duration: 00m 05s)
21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix
21:19 kindrobot: start UTC-late backport window
21:08 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided) (duration: 01m 01s)
21:07 hashar@deploy2002: Started deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided)
20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45583 and previous config saved to /var/cache/conftool/dbconfig/20230308-205435-marostegui.json
20:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
20:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45582 and previous config saved to /var/cache/conftool/dbconfig/20230308-205414-marostegui.json
20:51 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief2001.codfw.wmnet with OS bullseye
20:41 mutante: deploy2002 - systemctl restart keyholder-proxy.service to fix T331568 - after this SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -i /etc/keyholder.d/deploy_jenkins -l deploy-jenkins releases1002.eqiad.wmnet works
20:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
20:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45581 and previous config saved to /var/cache/conftool/dbconfig/20230308-203907-marostegui.json
20:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
20:24 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief2001.codfw.wmnet with OS bullseye
20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45580 and previous config saved to /var/cache/conftool/dbconfig/20230308-202401-marostegui.json
20:18 urandom: power cycle restbase2022 (unresponsive; cannot SSH)
20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45579 and previous config saved to /var/cache/conftool/dbconfig/20230308-200855-marostegui.json
20:01 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bullseye
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45578 and previous config saved to /var/cache/conftool/dbconfig/20230308-194646-marostegui.json
19:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45577 and previous config saved to /var/cache/conftool/dbconfig/20230308-194625-marostegui.json
19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
19:31 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test1001.eqiad.wmnet with OS bullseye
19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45576 and previous config saved to /var/cache/conftool/dbconfig/20230308-193118-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45575 and previous config saved to /var/cache/conftool/dbconfig/20230308-191612-marostegui.json
19:16 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.26 refs T330204 (duration: 06m 16s)
19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
19:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.26 refs T330204
19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief-test2001.codfw.wmnet
19:09 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief-test2001.codfw.wmnet
19:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45574 and previous config saved to /var/cache/conftool/dbconfig/20230308-190106-marostegui.json
18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45573 and previous config saved to /var/cache/conftool/dbconfig/20230308-184328-marostegui.json
18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45572 and previous config saved to /var/cache/conftool/dbconfig/20230308-184204-marostegui.json
18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45571 and previous config saved to /var/cache/conftool/dbconfig/20230308-184143-marostegui.json
18:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45570 and previous config saved to /var/cache/conftool/dbconfig/20230308-183020-ladsgroup.json
18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45569 and previous config saved to /var/cache/conftool/dbconfig/20230308-182822-marostegui.json
18:28 inflatador: bking@cumin2002 repool elastic1060-1066 to finish off T322082
18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45568 and previous config saved to /var/cache/conftool/dbconfig/20230308-182726-marostegui.json
18:27 inflatador: bking@cumin2002 unban elastic1060-1066 to finish off T322082
18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45567 and previous config saved to /var/cache/conftool/dbconfig/20230308-182637-marostegui.json
18:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
18:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
18:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test2001.codfw.wmnet with OS bullseye
18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45566 and previous config saved to /var/cache/conftool/dbconfig/20230308-181514-ladsgroup.json
18:14 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:13 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
18:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45565 and previous config saved to /var/cache/conftool/dbconfig/20230308-181316-marostegui.json
18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45564 and previous config saved to /var/cache/conftool/dbconfig/20230308-181220-marostegui.json
18:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45563 and previous config saved to /var/cache/conftool/dbconfig/20230308-181131-marostegui.json
18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
18:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
18:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
18:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45562 and previous config saved to /var/cache/conftool/dbconfig/20230308-180008-ladsgroup.json
17:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
17:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
17:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
17:58 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
17:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45561 and previous config saved to /var/cache/conftool/dbconfig/20230308-175810-marostegui.json
17:58 herron: failing grafana over from codfw to eqiad
17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45560 and previous config saved to /var/cache/conftool/dbconfig/20230308-175714-marostegui.json
17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45559 and previous config saved to /var/cache/conftool/dbconfig/20230308-175625-marostegui.json
17:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:48 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
17:47 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
17:47 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test2001.codfw.wmnet with OS bullseye
17:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45558 and previous config saved to /var/cache/conftool/dbconfig/20230308-174535-marostegui.json
17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
17:45 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45557 and previous config saved to /var/cache/conftool/dbconfig/20230308-174514-marostegui.json
17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45556 and previous config saved to /var/cache/conftool/dbconfig/20230308-174501-ladsgroup.json
17:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45555 and previous config saved to /var/cache/conftool/dbconfig/20230308-174208-marostegui.json
17:38 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45554 and previous config saved to /var/cache/conftool/dbconfig/20230308-173701-marostegui.json
17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45553 and previous config saved to /var/cache/conftool/dbconfig/20230308-173640-marostegui.json
17:34 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
17:34 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45552 and previous config saved to /var/cache/conftool/dbconfig/20230308-173125-marostegui.json
17:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
17:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45551 and previous config saved to /var/cache/conftool/dbconfig/20230308-173104-marostegui.json
17:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45550 and previous config saved to /var/cache/conftool/dbconfig/20230308-173007-marostegui.json
17:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
17:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45549 and previous config saved to /var/cache/conftool/dbconfig/20230308-172134-marostegui.json
17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45548 and previous config saved to /var/cache/conftool/dbconfig/20230308-171558-marostegui.json
17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45547 and previous config saved to /var/cache/conftool/dbconfig/20230308-171501-marostegui.json
17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45546 and previous config saved to /var/cache/conftool/dbconfig/20230308-170627-marostegui.json
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45545 and previous config saved to /var/cache/conftool/dbconfig/20230308-170512-ladsgroup.json
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45543 and previous config saved to /var/cache/conftool/dbconfig/20230308-170051-marostegui.json
16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45542 and previous config saved to /var/cache/conftool/dbconfig/20230308-165955-marostegui.json
16:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45541 and previous config saved to /var/cache/conftool/dbconfig/20230308-165121-marostegui.json
16:49 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45540 and previous config saved to /var/cache/conftool/dbconfig/20230308-164807-marostegui.json
16:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45539 and previous config saved to /var/cache/conftool/dbconfig/20230308-164746-marostegui.json
16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45538 and previous config saved to /var/cache/conftool/dbconfig/20230308-164545-marostegui.json
16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:35 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:34 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
16:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45537 and previous config saved to /var/cache/conftool/dbconfig/20230308-163311-marostegui.json
16:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45536 and previous config saved to /var/cache/conftool/dbconfig/20230308-163249-marostegui.json
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45535 and previous config saved to /var/cache/conftool/dbconfig/20230308-163240-marostegui.json
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45534 and previous config saved to /var/cache/conftool/dbconfig/20230308-163230-marostegui.json
16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
16:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
16:28 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
16:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
16:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
16:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
16:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45533 and previous config saved to /var/cache/conftool/dbconfig/20230308-161737-marostegui.json
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45532 and previous config saved to /var/cache/conftool/dbconfig/20230308-161727-marostegui.json
16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
16:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:10 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
16:08 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1062.eqiad.wmnet
16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
16:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
16:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 4 hosts
16:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 4 hosts
16:03 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45531 and previous config saved to /var/cache/conftool/dbconfig/20230308-160231-marostegui.json
16:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45530 and previous config saved to /var/cache/conftool/dbconfig/20230308-160221-marostegui.json
16:00 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1062.eqiad.wmnet
16:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1061.eqiad.wmnet
15:59 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
15:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
15:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
15:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
15:54 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1061.eqiad.wmnet
15:52 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
15:52 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
15:50 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:49 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45529 and previous config saved to /var/cache/conftool/dbconfig/20230308-154736-marostegui.json
15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45528 and previous config saved to /var/cache/conftool/dbconfig/20230308-154724-marostegui.json
15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45527 and previous config saved to /var/cache/conftool/dbconfig/20230308-154709-marostegui.json
15:46 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
15:42 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
15:33 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45526 and previous config saved to /var/cache/conftool/dbconfig/20230308-153202-marostegui.json
15:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
15:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
15:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
15:22 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Fix typo in rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 06m 41s)
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45525 and previous config saved to /var/cache/conftool/dbconfig/20230308-151656-marostegui.json
15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:05 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Declare rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 11m 33s)
15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45524 and previous config saved to /var/cache/conftool/dbconfig/20230308-150150-marostegui.json
14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45523 and previous config saved to /var/cache/conftool/dbconfig/20230308-145245-marostegui.json
14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45522 and previous config saved to /var/cache/conftool/dbconfig/20230308-144934-marostegui.json
14:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45521 and previous config saved to /var/cache/conftool/dbconfig/20230308-144924-marostegui.json
14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45520 and previous config saved to /var/cache/conftool/dbconfig/20230308-144659-marostegui.json
14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45519 and previous config saved to /var/cache/conftool/dbconfig/20230308-144634-marostegui.json
14:46 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:45 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:43 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:41 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:41 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:38 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:37 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45518 and previous config saved to /var/cache/conftool/dbconfig/20230308-143739-marostegui.json
14:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45517 and previous config saved to /var/cache/conftool/dbconfig/20230308-143418-marostegui.json
14:34 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:33 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:32 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:32 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45516 and previous config saved to /var/cache/conftool/dbconfig/20230308-143127-marostegui.json
14:25 inflatador: bking@cumin2002 powering down elastic1060-66 for re-rack T322082
14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45514 and previous config saved to /var/cache/conftool/dbconfig/20230308-142233-marostegui.json
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45513 and previous config saved to /var/cache/conftool/dbconfig/20230308-141911-marostegui.json
14:16 TheresNoTime: close UTC afternoon backport window
14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45511 and previous config saved to /var/cache/conftool/dbconfig/20230308-141621-marostegui.json
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45510 and previous config saved to /var/cache/conftool/dbconfig/20230308-140727-marostegui.json
14:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45509 and previous config saved to /var/cache/conftool/dbconfig/20230308-140405-marostegui.json
14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45508 and previous config saved to /var/cache/conftool/dbconfig/20230308-140115-marostegui.json
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45507 and previous config saved to /var/cache/conftool/dbconfig/20230308-135153-marostegui.json
13:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45506 and previous config saved to /var/cache/conftool/dbconfig/20230308-135132-marostegui.json
13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45505 and previous config saved to /var/cache/conftool/dbconfig/20230308-134945-marostegui.json
13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45504 and previous config saved to /var/cache/conftool/dbconfig/20230308-134034-marostegui.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45503 and previous config saved to /var/cache/conftool/dbconfig/20230308-134002-marostegui.json
13:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
13:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45502 and previous config saved to /var/cache/conftool/dbconfig/20230308-133940-marostegui.json
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45501 and previous config saved to /var/cache/conftool/dbconfig/20230308-133626-marostegui.json
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45500 and previous config saved to /var/cache/conftool/dbconfig/20230308-132528-marostegui.json
13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45499 and previous config saved to /var/cache/conftool/dbconfig/20230308-132434-marostegui.json
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45498 and previous config saved to /var/cache/conftool/dbconfig/20230308-132120-marostegui.json
13:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host urldownloader1003.wikimedia.org with OS bullseye
13:11 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
13:11 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45497 and previous config saved to /var/cache/conftool/dbconfig/20230308-131022-marostegui.json
13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45496 and previous config saved to /var/cache/conftool/dbconfig/20230308-130928-marostegui.json
13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45495 and previous config saved to /var/cache/conftool/dbconfig/20230308-130613-marostegui.json
13:02 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:00 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45494 and previous config saved to /var/cache/conftool/dbconfig/20230308-125548-marostegui.json
12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45493 and previous config saved to /var/cache/conftool/dbconfig/20230308-125527-marostegui.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45492 and previous config saved to /var/cache/conftool/dbconfig/20230308-125515-marostegui.json
12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45491 and previous config saved to /var/cache/conftool/dbconfig/20230308-125422-marostegui.json
12:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45490 and previous config saved to /var/cache/conftool/dbconfig/20230308-124945-marostegui.json
12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45489 and previous config saved to /var/cache/conftool/dbconfig/20230308-124924-marostegui.json
12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45488 and previous config saved to /var/cache/conftool/dbconfig/20230308-124344-marostegui.json
12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45487 and previous config saved to /var/cache/conftool/dbconfig/20230308-124334-marostegui.json
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45486 and previous config saved to /var/cache/conftool/dbconfig/20230308-124021-marostegui.json
12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45485 and previous config saved to /var/cache/conftool/dbconfig/20230308-123418-marostegui.json
12:31 hnowlan: running authdns-update for r/890398
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45484 and previous config saved to /var/cache/conftool/dbconfig/20230308-122827-marostegui.json
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45483 and previous config saved to /var/cache/conftool/dbconfig/20230308-122515-marostegui.json
12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45482 and previous config saved to /var/cache/conftool/dbconfig/20230308-121912-marostegui.json
12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bullseye
12:14 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45480 and previous config saved to /var/cache/conftool/dbconfig/20230308-121321-marostegui.json
12:10 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45479 and previous config saved to /var/cache/conftool/dbconfig/20230308-121009-marostegui.json
12:09 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host urldownloader1003.wikimedia.org with OS bullseye
12:08 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45478 and previous config saved to /var/cache/conftool/dbconfig/20230308-120406-marostegui.json
12:01 claime: restbase-async back in standard state - T330651
12:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
12:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T330651
11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45477 and previous config saved to /var/cache/conftool/dbconfig/20230308-115935-marostegui.json
11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45476 and previous config saved to /var/cache/conftool/dbconfig/20230308-115924-marostegui.json
11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45475 and previous config saved to /var/cache/conftool/dbconfig/20230308-115913-marostegui.json
11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45474 and previous config saved to /var/cache/conftool/dbconfig/20230308-115903-marostegui.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45473 and previous config saved to /var/cache/conftool/dbconfig/20230308-115815-marostegui.json
11:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
11:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
11:55 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
11:55 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T330651
11:55 claime: restbase-async pooled in eqiad, depooling in codfw- T330651
11:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool restbase-async in eqiad: T330651
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45472 and previous config saved to /var/cache/conftool/dbconfig/20230308-115252-root.json
11:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
11:49 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
11:49 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool restbase-async in eqiad: T330651
11:49 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9] (duration: 01m 30s)
11:48 claime: Starting restbase-async switchback - T330651
11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9]
11:47 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9] (duration: 00m 07s)
11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9]
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45471 and previous config saved to /var/cache/conftool/dbconfig/20230308-114652-marostegui.json
11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45470 and previous config saved to /var/cache/conftool/dbconfig/20230308-114642-marostegui.json
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45469 and previous config saved to /var/cache/conftool/dbconfig/20230308-114553-root.json
11:44 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bullseye
11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45468 and previous config saved to /var/cache/conftool/dbconfig/20230308-114407-marostegui.json
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45467 and previous config saved to /var/cache/conftool/dbconfig/20230308-114357-marostegui.json
11:42 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 05m 09s)
11:37 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
11:37 otto@deploy2002: deploy aborted: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 09m 38s)
11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45466 and previous config saved to /var/cache/conftool/dbconfig/20230308-113136-marostegui.json
11:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45465 and previous config saved to /var/cache/conftool/dbconfig/20230308-112901-marostegui.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45464 and previous config saved to /var/cache/conftool/dbconfig/20230308-112850-marostegui.json
11:27 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
11:27 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:27 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:26 akosiaris: T307943 upgrade kubernetes-client on deploy1002 deploy2002
11:25 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
11:23 claime: Traffic: authdns updated successfully for eqiad repool - T331285
11:21 claime: Traffic: repool eqiad for user traffic - T331285
11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45463 and previous config saved to /var/cache/conftool/dbconfig/20230308-111628-marostegui.json
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45462 and previous config saved to /var/cache/conftool/dbconfig/20230308-111355-marostegui.json
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45461 and previous config saved to /var/cache/conftool/dbconfig/20230308-111344-marostegui.json
11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45460 and previous config saved to /var/cache/conftool/dbconfig/20230308-110907-marostegui.json
11:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45459 and previous config saved to /var/cache/conftool/dbconfig/20230308-110846-marostegui.json
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45458 and previous config saved to /var/cache/conftool/dbconfig/20230308-110306-marostegui.json
11:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45457 and previous config saved to /var/cache/conftool/dbconfig/20230308-110121-marostegui.json
10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45456 and previous config saved to /var/cache/conftool/dbconfig/20230308-105347-marostegui.json
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45455 and previous config saved to /var/cache/conftool/dbconfig/20230308-105339-marostegui.json
10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:51 otto@deploy2002: Finished deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334] (duration: 08m 20s)
10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45454 and previous config saved to /var/cache/conftool/dbconfig/20230308-105043-marostegui.json
10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45453 and previous config saved to /var/cache/conftool/dbconfig/20230308-105022-marostegui.json
10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:49 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:42 otto@deploy2002: Started deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334]
10:40 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45452 and previous config saved to /var/cache/conftool/dbconfig/20230308-103840-marostegui.json
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45451 and previous config saved to /var/cache/conftool/dbconfig/20230308-103833-marostegui.json
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45450 and previous config saved to /var/cache/conftool/dbconfig/20230308-103515-marostegui.json
10:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45449 and previous config saved to /var/cache/conftool/dbconfig/20230308-102334-marostegui.json
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45448 and previous config saved to /var/cache/conftool/dbconfig/20230308-102326-marostegui.json
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45447 and previous config saved to /var/cache/conftool/dbconfig/20230308-102009-marostegui.json
10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45446 and previous config saved to /var/cache/conftool/dbconfig/20230308-101944-marostegui.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45445 and previous config saved to /var/cache/conftool/dbconfig/20230308-100826-marostegui.json
10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45444 and previous config saved to /var/cache/conftool/dbconfig/20230308-100502-marostegui.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45443 and previous config saved to /var/cache/conftool/dbconfig/20230308-100437-marostegui.json
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45442 and previous config saved to /var/cache/conftool/dbconfig/20230308-095804-marostegui.json
09:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
09:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45441 and previous config saved to /var/cache/conftool/dbconfig/20230308-095742-marostegui.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45440 and previous config saved to /var/cache/conftool/dbconfig/20230308-095320-marostegui.json
09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45439 and previous config saved to /var/cache/conftool/dbconfig/20230308-095259-marostegui.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45438 and previous config saved to /var/cache/conftool/dbconfig/20230308-094931-marostegui.json
09:45 claime: Rebuilding production-images for 894687
09:43 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45437 and previous config saved to /var/cache/conftool/dbconfig/20230308-094236-marostegui.json
09:42 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45436 and previous config saved to /var/cache/conftool/dbconfig/20230308-093752-marostegui.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45435 and previous config saved to /var/cache/conftool/dbconfig/20230308-093424-marostegui.json
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45434 and previous config saved to /var/cache/conftool/dbconfig/20230308-093106-marostegui.json
09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45433 and previous config saved to /var/cache/conftool/dbconfig/20230308-093045-marostegui.json
09:30 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45432 and previous config saved to /var/cache/conftool/dbconfig/20230308-092729-marostegui.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45431 and previous config saved to /var/cache/conftool/dbconfig/20230308-092246-marostegui.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45430 and previous config saved to /var/cache/conftool/dbconfig/20230308-091538-marostegui.json
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45429 and previous config saved to /var/cache/conftool/dbconfig/20230308-091223-marostegui.json
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45428 and previous config saved to /var/cache/conftool/dbconfig/20230308-090739-marostegui.json
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45426 and previous config saved to /var/cache/conftool/dbconfig/20230308-090156-marostegui.json
09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45425 and previous config saved to /var/cache/conftool/dbconfig/20230308-090134-marostegui.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45424 and previous config saved to /var/cache/conftool/dbconfig/20230308-090031-marostegui.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45423 and previous config saved to /var/cache/conftool/dbconfig/20230308-085608-marostegui.json
08:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45422 and previous config saved to /var/cache/conftool/dbconfig/20230308-085546-marostegui.json
08:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:53 akosiaris: remove 10.64.64.0/21 and 10.192.64.0/21 from calico GlobalNetworkPolicies T326617
08:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45421 and previous config saved to /var/cache/conftool/dbconfig/20230308-085159-root.json
08:50 vgutierrez: re-enable HAProxy systemd service unit hardening in ulsfo - T323944
08:49 moritzm: installing git security updates
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45420 and previous config saved to /var/cache/conftool/dbconfig/20230308-084628-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45419 and previous config saved to /var/cache/conftool/dbconfig/20230308-084525-marostegui.json
08:41 marostegui: Deploy schema change on s3 eqiad dbmaint T329203
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45418 and previous config saved to /var/cache/conftool/dbconfig/20230308-084053-marostegui.json
08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45417 and previous config saved to /var/cache/conftool/dbconfig/20230308-084040-marostegui.json
08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45416 and previous config saved to /var/cache/conftool/dbconfig/20230308-083843-marostegui.json
08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
08:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 15%: Repooling', diff saved to https://phabricator.wikimedia.org/P45415 and previous config saved to /var/cache/conftool/dbconfig/20230308-083731-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45414 and previous config saved to /var/cache/conftool/dbconfig/20230308-083654-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P45413 and previous config saved to /var/cache/conftool/dbconfig/20230308-083618-marostegui.json
08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
08:34 marostegui: Deploy schema change on s3 eqiad dbmaint T329260
08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
08:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
08:32 marostegui: Deploy schema change on s5 eqiad dbmaint T329260
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45412 and previous config saved to /var/cache/conftool/dbconfig/20230308-083121-marostegui.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45411 and previous config saved to /var/cache/conftool/dbconfig/20230308-082533-marostegui.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45410 and previous config saved to /var/cache/conftool/dbconfig/20230308-082149-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45409 and previous config saved to /var/cache/conftool/dbconfig/20230308-082112-marostegui.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45408 and previous config saved to /var/cache/conftool/dbconfig/20230308-081809-marostegui.json
08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45407 and previous config saved to /var/cache/conftool/dbconfig/20230308-081748-marostegui.json
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45406 and previous config saved to /var/cache/conftool/dbconfig/20230308-081614-marostegui.json
08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 19 hosts with reason: Schema change
08:15 marostegui: Deploy schema change on s8 eqiad dbmaint T329260
08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 19 hosts with reason: Schema change
08:15 marostegui: Deploy schema change on s7 eqiad dbmaint T329260
08:15 marostegui: Deploy schema change on s4 eqiad dbmaint T329260
08:15 marostegui: Deploy schema change on s1 eqiad dbmaint T329260
08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2093.codfw.wmnet
08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45405 and previous config saved to /var/cache/conftool/dbconfig/20230308-081027-marostegui.json
08:09 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
08:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45404 and previous config saved to /var/cache/conftool/dbconfig/20230308-080644-root.json
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45403 and previous config saved to /var/cache/conftool/dbconfig/20230308-080431-marostegui.json
08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
08:02 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2093.codfw.wmnet
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45402 and previous config saved to /var/cache/conftool/dbconfig/20230308-080241-marostegui.json
08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 20 hosts with reason: Schema change
08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 20 hosts with reason: Schema change
08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 22 hosts with reason: Schema change
08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 22 hosts with reason: Schema change
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45401 and previous config saved to /var/cache/conftool/dbconfig/20230308-075857-marostegui.json
07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
07:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45400 and previous config saved to /var/cache/conftool/dbconfig/20230308-075139-root.json
07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:47 taavi@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard (duration: 04m 56s)
07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45399 and previous config saved to /var/cache/conftool/dbconfig/20230308-074735-marostegui.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1109', diff saved to https://phabricator.wikimedia.org/P45398 and previous config saved to /var/cache/conftool/dbconfig/20230308-074427-marostegui.json
07:42 taavi@deploy2002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45397 and previous config saved to /var/cache/conftool/dbconfig/20230308-073633-root.json
07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45396 and previous config saved to /var/cache/conftool/dbconfig/20230308-073228-marostegui.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T330991', diff saved to https://phabricator.wikimedia.org/P45395 and previous config saved to /var/cache/conftool/dbconfig/20230308-073110-root.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1126 to s8 primary T330991', diff saved to https://phabricator.wikimedia.org/P45394 and previous config saved to /var/cache/conftool/dbconfig/20230308-073005-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45393 and previous config saved to /var/cache/conftool/dbconfig/20230308-072932-marostegui.json
07:29 marostegui: Starting s8 eqiad failover from db1109 to db1126 - T330991
07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45392 and previous config saved to /var/cache/conftool/dbconfig/20230308-072128-root.json
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 0 T330991', diff saved to https://phabricator.wikimedia.org/P45391 and previous config saved to /var/cache/conftool/dbconfig/20230308-070544-root.json
07:05 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
07:05 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45390 and previous config saved to /var/cache/conftool/dbconfig/20230308-070458-marostegui.json
07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
06:53 marostegui: Failover m3 from db1101 to db1159 - T331387
06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45389 and previous config saved to /var/cache/conftool/dbconfig/20230308-055038-marostegui.json
05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45388 and previous config saved to /var/cache/conftool/dbconfig/20230308-053531-marostegui.json
05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45387 and previous config saved to /var/cache/conftool/dbconfig/20230308-052024-marostegui.json
05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45386 and previous config saved to /var/cache/conftool/dbconfig/20230308-050517-marostegui.json
04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45385 and previous config saved to /var/cache/conftool/dbconfig/20230308-040451-marostegui.json
04:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
04:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45384 and previous config saved to /var/cache/conftool/dbconfig/20230308-040430-marostegui.json
03:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45383 and previous config saved to /var/cache/conftool/dbconfig/20230308-034923-marostegui.json
03:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45382 and previous config saved to /var/cache/conftool/dbconfig/20230308-033416-marostegui.json
03:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45381 and previous config saved to /var/cache/conftool/dbconfig/20230308-031910-marostegui.json
03:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45380 and previous config saved to /var/cache/conftool/dbconfig/20230308-031257-marostegui.json
03:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45379 and previous config saved to /var/cache/conftool/dbconfig/20230308-031246-marostegui.json
02:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45378 and previous config saved to /var/cache/conftool/dbconfig/20230308-025739-marostegui.json
02:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45377 and previous config saved to /var/cache/conftool/dbconfig/20230308-024536-marostegui.json
02:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45376 and previous config saved to /var/cache/conftool/dbconfig/20230308-024233-marostegui.json
02:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45375 and previous config saved to /var/cache/conftool/dbconfig/20230308-023029-marostegui.json
02:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45374 and previous config saved to /var/cache/conftool/dbconfig/20230308-022726-marostegui.json
02:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45373 and previous config saved to /var/cache/conftool/dbconfig/20230308-022116-marostegui.json
02:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
02:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45372 and previous config saved to /var/cache/conftool/dbconfig/20230308-022054-marostegui.json
02:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45371 and previous config saved to /var/cache/conftool/dbconfig/20230308-021523-marostegui.json
02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45370 and previous config saved to /var/cache/conftool/dbconfig/20230308-020547-marostegui.json
02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45369 and previous config saved to /var/cache/conftool/dbconfig/20230308-020016-marostegui.json
01:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45368 and previous config saved to /var/cache/conftool/dbconfig/20230308-015921-marostegui.json
01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45367 and previous config saved to /var/cache/conftool/dbconfig/20230308-015040-marostegui.json
01:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45366 and previous config saved to /var/cache/conftool/dbconfig/20230308-014659-marostegui.json
01:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45365 and previous config saved to /var/cache/conftool/dbconfig/20230308-014637-marostegui.json
01:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45364 and previous config saved to /var/cache/conftool/dbconfig/20230308-014415-marostegui.json
01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45363 and previous config saved to /var/cache/conftool/dbconfig/20230308-013534-marostegui.json
01:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45362 and previous config saved to /var/cache/conftool/dbconfig/20230308-013131-marostegui.json
01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45361 and previous config saved to /var/cache/conftool/dbconfig/20230308-012918-marostegui.json
01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45360 and previous config saved to /var/cache/conftool/dbconfig/20230308-012908-marostegui.json
01:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
01:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45359 and previous config saved to /var/cache/conftool/dbconfig/20230308-012901-marostegui.json
01:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45358 and previous config saved to /var/cache/conftool/dbconfig/20230308-011624-marostegui.json
01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45357 and previous config saved to /var/cache/conftool/dbconfig/20230308-011401-marostegui.json
01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45356 and previous config saved to /var/cache/conftool/dbconfig/20230308-011354-marostegui.json
01:09 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
01:08 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bullseye
01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45355 and previous config saved to /var/cache/conftool/dbconfig/20230308-010321-marostegui.json
01:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
01:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45354 and previous config saved to /var/cache/conftool/dbconfig/20230308-010300-marostegui.json
01:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45353 and previous config saved to /var/cache/conftool/dbconfig/20230308-010117-marostegui.json
00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45352 and previous config saved to /var/cache/conftool/dbconfig/20230308-005848-marostegui.json
00:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
00:51 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45351 and previous config saved to /var/cache/conftool/dbconfig/20230308-004753-marostegui.json
00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45350 and previous config saved to /var/cache/conftool/dbconfig/20230308-004744-marostegui.json
00:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45349 and previous config saved to /var/cache/conftool/dbconfig/20230308-004722-marostegui.json
00:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45348 and previous config saved to /var/cache/conftool/dbconfig/20230308-004341-marostegui.json
00:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45347 and previous config saved to /var/cache/conftool/dbconfig/20230308-004115-marostegui.json
00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45346 and previous config saved to /var/cache/conftool/dbconfig/20230308-004049-marostegui.json
00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45345 and previous config saved to /var/cache/conftool/dbconfig/20230308-003240-marostegui.json
00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45344 and previous config saved to /var/cache/conftool/dbconfig/20230308-003216-marostegui.json
00:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
00:29 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir1002.eqiad.wmnet with OS bullseye
00:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45343 and previous config saved to /var/cache/conftool/dbconfig/20230308-002543-marostegui.json
00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45342 and previous config saved to /var/cache/conftool/dbconfig/20230308-001734-marostegui.json
00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45341 and previous config saved to /var/cache/conftool/dbconfig/20230308-001709-marostegui.json
00:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45340 and previous config saved to /var/cache/conftool/dbconfig/20230308-001036-marostegui.json
00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45339 and previous config saved to /var/cache/conftool/dbconfig/20230308-000538-marostegui.json
00:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
00:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45338 and previous config saved to /var/cache/conftool/dbconfig/20230308-000516-marostegui.json
00:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45337 and previous config saved to /var/cache/conftool/dbconfig/20230308-000203-marostegui.json

2023-03-07

23:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45336 and previous config saved to /var/cache/conftool/dbconfig/20230307-235529-marostegui.json
23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45335 and previous config saved to /var/cache/conftool/dbconfig/20230307-235010-marostegui.json
23:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45334 and previous config saved to /var/cache/conftool/dbconfig/20230307-234858-marostegui.json
23:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
23:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
23:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45333 and previous config saved to /var/cache/conftool/dbconfig/20230307-234837-marostegui.json
23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45332 and previous config saved to /var/cache/conftool/dbconfig/20230307-234741-marostegui.json
23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45331 and previous config saved to /var/cache/conftool/dbconfig/20230307-234715-marostegui.json
23:40 ryankemper@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper (duration: 00m 15s)
23:39 ryankemper@deploy2002: Started deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper
23:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45329 and previous config saved to /var/cache/conftool/dbconfig/20230307-233503-marostegui.json
23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45328 and previous config saved to /var/cache/conftool/dbconfig/20230307-233330-marostegui.json
23:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
23:32 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
23:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45327 and previous config saved to /var/cache/conftool/dbconfig/20230307-233209-marostegui.json
23:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
23:30 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
23:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45326 and previous config saved to /var/cache/conftool/dbconfig/20230307-231957-marostegui.json
23:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45325 and previous config saved to /var/cache/conftool/dbconfig/20230307-231824-marostegui.json
23:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45324 and previous config saved to /var/cache/conftool/dbconfig/20230307-231702-marostegui.json
23:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45323 and previous config saved to /var/cache/conftool/dbconfig/20230307-230317-marostegui.json
23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45322 and previous config saved to /var/cache/conftool/dbconfig/20230307-230156-marostegui.json
22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45321 and previous config saved to /var/cache/conftool/dbconfig/20230307-225951-marostegui.json
22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
22:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
22:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bullseye
22:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45319 and previous config saved to /var/cache/conftool/dbconfig/20230307-225110-marostegui.json
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45318 and previous config saved to /var/cache/conftool/dbconfig/20230307-224803-marostegui.json
22:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45317 and previous config saved to /var/cache/conftool/dbconfig/20230307-224742-marostegui.json
22:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bullseye
22:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
22:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
22:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45316 and previous config saved to /var/cache/conftool/dbconfig/20230307-223603-marostegui.json
22:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45315 and previous config saved to /var/cache/conftool/dbconfig/20230307-223235-marostegui.json
22:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
22:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
22:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2002.codfw.wmnet with OS bullseye
22:26 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
22:25 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
22:23 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bullseye
22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45314 and previous config saved to /var/cache/conftool/dbconfig/20230307-222056-marostegui.json
22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45313 and previous config saved to /var/cache/conftool/dbconfig/20230307-221931-marostegui.json
22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45312 and previous config saved to /var/cache/conftool/dbconfig/20230307-221854-marostegui.json
22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45311 and previous config saved to /var/cache/conftool/dbconfig/20230307-221729-marostegui.json
22:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1001.eqiad.wmnet with OS bullseye
22:14 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4002.ulsfo.wmnet
22:13 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4002.ulsfo.wmnet with OS bullseye
22:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
22:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45310 and previous config saved to /var/cache/conftool/dbconfig/20230307-220550-marostegui.json
22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45309 and previous config saved to /var/cache/conftool/dbconfig/20230307-220438-marostegui.json
22:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
22:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45308 and previous config saved to /var/cache/conftool/dbconfig/20230307-220416-marostegui.json
22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45307 and previous config saved to /var/cache/conftool/dbconfig/20230307-220348-marostegui.json
22:03 mforns@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: (no justification provided) (duration: 00m 18s)
22:03 mforns@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: (no justification provided)
22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45306 and previous config saved to /var/cache/conftool/dbconfig/20230307-220222-marostegui.json
21:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6002.drmrs.wmnet with OS bullseye
21:58 inflatador: bking@cumin2002 depool elastic row D hosts to prepare for T322082
21:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 7 hosts with reason: re-rack
21:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 7 hosts with reason: re-rack
21:56 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2001.codfw.wmnet with OS bullseye
21:56 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
21:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
21:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3002.esams.wmnet
21:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3002.esams.wmnet with OS bullseye
21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
21:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45305 and previous config saved to /var/cache/conftool/dbconfig/20230307-214910-marostegui.json
21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45304 and previous config saved to /var/cache/conftool/dbconfig/20230307-214841-marostegui.json
21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45303 and previous config saved to /var/cache/conftool/dbconfig/20230307-214824-marostegui.json
21:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45302 and previous config saved to /var/cache/conftool/dbconfig/20230307-214802-marostegui.json
21:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
21:43 TheresNoTime: close UTC late backport window
21:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
21:41 inflatador: bking@cumin2002 ban elastic row D hosts to prepare for T322082
21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2073.codfw.wmnet with OS bullseye
21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4002.ulsfo.wmnet with OS bullseye
21:38 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4002.ulsfo.wmnet
21:37 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4001.ulsfo.wmnet
21:37 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4001.ulsfo.wmnet with OS bullseye
21:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
21:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45301 and previous config saved to /var/cache/conftool/dbconfig/20230307-213403-marostegui.json
21:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45300 and previous config saved to /var/cache/conftool/dbconfig/20230307-213334-marostegui.json
21:33 samtar@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) (duration: 09m 11s)
21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45299 and previous config saved to /var/cache/conftool/dbconfig/20230307-213256-marostegui.json
21:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
21:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:27 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6002.drmrs.wmnet with OS bullseye
21:25 samtar@deploy2002: sbailey and samtar: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:23 samtar@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612)
21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45298 and previous config saved to /var/cache/conftool/dbconfig/20230307-212138-marostegui.json
21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
21:20 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.26 refs T330204
21:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
21:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45297 and previous config saved to /var/cache/conftool/dbconfig/20230307-211857-marostegui.json
21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45296 and previous config saved to /var/cache/conftool/dbconfig/20230307-211749-marostegui.json
21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45295 and previous config saved to /var/cache/conftool/dbconfig/20230307-211744-marostegui.json
21:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
21:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3002.esams.wmnet with OS bullseye
21:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45294 and previous config saved to /var/cache/conftool/dbconfig/20230307-211723-marostegui.json
21:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
21:17 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3002.esams.wmnet
21:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3001.esams.wmnet
21:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3001.esams.wmnet with OS bullseye
21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45293 and previous config saved to /var/cache/conftool/dbconfig/20230307-211159-marostegui.json
21:10 bblack: lvs500[45]: re-enabling/pooling, back to normal flow
21:10 jhuneidi@deploy2002: Pruned MediaWiki: 1.40.0-wmf.24 (duration: 02m 08s)
21:07 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 43m 53s)
21:07 bking@deploy2002: Finished deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk (duration: 00m 41s)
21:07 bking@deploy2002: Started deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk
21:06 bblack: lvs500[45]: disabling puppet and stopping pybal, all eqsin traffic through lvs5006 temporarily...
21:03 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4001.ulsfo.wmnet with OS bullseye
21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.ulsfo.wmnet
21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45292 and previous config saved to /var/cache/conftool/dbconfig/20230307-210243-marostegui.json
21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.drmrs.wmnet
21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45291 and previous config saved to /var/cache/conftool/dbconfig/20230307-210216-marostegui.json
20:58 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: test deploy new airflow instance (duration: 02m 03s)
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45290 and previous config saved to /var/cache/conftool/dbconfig/20230307-205653-marostegui.json
20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
20:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
20:56 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 01s)
20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
20:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45289 and previous config saved to /var/cache/conftool/dbconfig/20230307-204925-marostegui.json
20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45288 and previous config saved to /var/cache/conftool/dbconfig/20230307-204904-marostegui.json
20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45287 and previous config saved to /var/cache/conftool/dbconfig/20230307-204710-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45286 and previous config saved to /var/cache/conftool/dbconfig/20230307-204146-marostegui.json
20:35 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3001.esams.wmnet with OS bullseye
20:35 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3001.drmrs.wmnet
20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45284 and previous config saved to /var/cache/conftool/dbconfig/20230307-203357-marostegui.json
20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45283 and previous config saved to /var/cache/conftool/dbconfig/20230307-203203-marostegui.json
20:30 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 02s)
20:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
20:30 ebernhardson@deploy2002: Finished deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance (duration: 00m 05s)
20:29 ebernhardson@deploy2002: Started deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance
20:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2073.codfw.wmnet with OS bullseye
20:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45282 and previous config saved to /var/cache/conftool/dbconfig/20230307-202713-marostegui.json
20:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
20:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45281 and previous config saved to /var/cache/conftool/dbconfig/20230307-202652-marostegui.json
20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45280 and previous config saved to /var/cache/conftool/dbconfig/20230307-202640-marostegui.json
20:24 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6002.eqsin.wmnet
20:19 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6002.drmrs.wmnet with OS bullseye
20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45279 and previous config saved to /var/cache/conftool/dbconfig/20230307-201851-marostegui.json
20:17 bking@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk (duration: 01m 18s)
20:16 bking@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk
20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45277 and previous config saved to /var/cache/conftool/dbconfig/20230307-201414-marostegui.json
20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
20:14 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 01m 49s)
20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45276 and previous config saved to /var/cache/conftool/dbconfig/20230307-201353-marostegui.json
20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45274 and previous config saved to /var/cache/conftool/dbconfig/20230307-201145-marostegui.json
20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45273 and previous config saved to /var/cache/conftool/dbconfig/20230307-200344-marostegui.json
20:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
19:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45272 and previous config saved to /var/cache/conftool/dbconfig/20230307-195846-marostegui.json
19:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45270 and previous config saved to /var/cache/conftool/dbconfig/20230307-195639-marostegui.json
19:51 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS bullseye
19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45268 and previous config saved to /var/cache/conftool/dbconfig/20230307-194934-marostegui.json
19:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
19:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45267 and previous config saved to /var/cache/conftool/dbconfig/20230307-194913-marostegui.json
19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45266 and previous config saved to /var/cache/conftool/dbconfig/20230307-194340-marostegui.json
19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45265 and previous config saved to /var/cache/conftool/dbconfig/20230307-194132-marostegui.json
19:40 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 00m 07s)
19:40 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6002.drmrs.wmnet with OS bullseye
19:40 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
19:40 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6002.eqsin.wmnet
19:40 ejegg: payments-wiki upgraded from 346e6f61 to 05a5e09a
19:39 jhuneidi@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.40.0-wmf.26" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 255. (duration: 00m 02s)
19:39 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
19:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
19:37 brett@cumin2002: conftool action : set/pooled=yess; selector: name=ncredir6001.eqsin.wmnet
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45264 and previous config saved to /var/cache/conftool/dbconfig/20230307-193639-marostegui.json
19:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
19:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45263 and previous config saved to /var/cache/conftool/dbconfig/20230307-193617-marostegui.json
19:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
19:35 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS bullseye
19:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45262 and previous config saved to /var/cache/conftool/dbconfig/20230307-193406-marostegui.json
19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
19:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
19:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45261 and previous config saved to /var/cache/conftool/dbconfig/20230307-192833-marostegui.json
19:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
19:21 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6001.drmrs.wmnet with OS bullseye
19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45260 and previous config saved to /var/cache/conftool/dbconfig/20230307-192111-marostegui.json
19:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45259 and previous config saved to /var/cache/conftool/dbconfig/20230307-191900-marostegui.json
19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45258 and previous config saved to /var/cache/conftool/dbconfig/20230307-191717-marostegui.json
19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
19:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
19:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45257 and previous config saved to /var/cache/conftool/dbconfig/20230307-191656-marostegui.json
19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2072.codfw.wmnet with OS bullseye
19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:08 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5002.eqsin.wmnet with OS bullseye
19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4002.ulsfo.wmnet with OS bullseye
19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45256 and previous config saved to /var/cache/conftool/dbconfig/20230307-190604-marostegui.json
19:04 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45255 and previous config saved to /var/cache/conftool/dbconfig/20230307-190353-marostegui.json
19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45254 and previous config saved to /var/cache/conftool/dbconfig/20230307-190149-marostegui.json
19:01 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS bullseye
18:59 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 12m 38s)
18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
18:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45253 and previous config saved to /var/cache/conftool/dbconfig/20230307-185058-marostegui.json
18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45252 and previous config saved to /var/cache/conftool/dbconfig/20230307-184907-marostegui.json
18:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
18:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
18:47 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45251 and previous config saved to /var/cache/conftool/dbconfig/20230307-184642-marostegui.json
18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45250 and previous config saved to /var/cache/conftool/dbconfig/20230307-184506-marostegui.json
18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45249 and previous config saved to /var/cache/conftool/dbconfig/20230307-184428-marostegui.json
18:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
18:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
18:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6001.drmrs.wmnet with OS bullseye
18:39 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6001.eqsin.wmnet
18:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
18:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
18:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
18:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
18:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45248 and previous config saved to /var/cache/conftool/dbconfig/20230307-183810-marostegui.json
18:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
18:35 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir5002.eqsin.wmnet
18:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45247 and previous config saved to /var/cache/conftool/dbconfig/20230307-183136-marostegui.json
18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45246 and previous config saved to /var/cache/conftool/dbconfig/20230307-182921-marostegui.json
18:29 dancy: dancy@deploy2002: Fixing up /srv/mediawiki-staging/.git permissions
18:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2072.codfw.wmnet with OS bullseye
18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2071.codfw.wmnet with OS bullseye
18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45245 and previous config saved to /var/cache/conftool/dbconfig/20230307-182304-marostegui.json
18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45244 and previous config saved to /var/cache/conftool/dbconfig/20230307-182035-marostegui.json
18:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
18:20 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
18:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45243 and previous config saved to /var/cache/conftool/dbconfig/20230307-182013-marostegui.json
18:19 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5001.eqsin.wmnet with OS bullseye
18:18 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4001.ulsfo.wmnet with OS bullseye
18:17 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3002.esams.wmnet with OS bullseye
18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5002.eqsin.wmnet with OS bullseye
18:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45242 and previous config saved to /var/cache/conftool/dbconfig/20230307-181414-marostegui.json
18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45241 and previous config saved to /var/cache/conftool/dbconfig/20230307-180757-marostegui.json
18:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45240 and previous config saved to /var/cache/conftool/dbconfig/20230307-180506-marostegui.json
18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45239 and previous config saved to /var/cache/conftool/dbconfig/20230307-175907-marostegui.json
17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45238 and previous config saved to /var/cache/conftool/dbconfig/20230307-175314-marostegui.json
17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45237 and previous config saved to /var/cache/conftool/dbconfig/20230307-175251-marostegui.json
17:51 inflatador: bking@cumin2002 repool wdqs hosts post-maintenance T329073
17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45236 and previous config saved to /var/cache/conftool/dbconfig/20230307-175000-marostegui.json
17:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
17:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45235 and previous config saved to /var/cache/conftool/dbconfig/20230307-174848-marostegui.json
17:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
17:47 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
17:47 volans@cumin1001: START - Cookbook sre.network.cf
17:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
17:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3002.esams.wmnet with OS bullseye
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45234 and previous config saved to /var/cache/conftool/dbconfig/20230307-173923-marostegui.json
17:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
17:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45233 and previous config saved to /var/cache/conftool/dbconfig/20230307-173901-marostegui.json
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45232 and previous config saved to /var/cache/conftool/dbconfig/20230307-173453-marostegui.json
17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45231 and previous config saved to /var/cache/conftool/dbconfig/20230307-173341-marostegui.json
17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3001.esams.wmnet with OS bullseye
17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45229 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45230 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
17:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bullseye
17:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45228 and previous config saved to /var/cache/conftool/dbconfig/20230307-172333-marostegui.json
17:22 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5002.eqsin.wmnet with OS bullseye
17:21 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir5002.eqsin.wmnet
17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45227 and previous config saved to /var/cache/conftool/dbconfig/20230307-171834-marostegui.json
17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
17:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
17:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45226 and previous config saved to /var/cache/conftool/dbconfig/20230307-170848-marostegui.json
17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45225 and previous config saved to /var/cache/conftool/dbconfig/20230307-170826-marostegui.json
17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45224 and previous config saved to /var/cache/conftool/dbconfig/20230307-170328-marostegui.json
17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45223 and previous config saved to /var/cache/conftool/dbconfig/20230307-170215-marostegui.json
17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45222 and previous config saved to /var/cache/conftool/dbconfig/20230307-170154-marostegui.json
16:58 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
16:57 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3001.esams.wmnet with OS bullseye
16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45221 and previous config saved to /var/cache/conftool/dbconfig/20230307-165340-marostegui.json
16:53 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2002.codfw.wmnet with OS bullseye
16:53 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@9924c93]: (no justification provided) (duration: 00m 11s)
16:53 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@9924c93]: (no justification provided)
16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45220 and previous config saved to /var/cache/conftool/dbconfig/20230307-165319-marostegui.json
16:52 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS bullseye
16:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
16:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45219 and previous config saved to /var/cache/conftool/dbconfig/20230307-164647-marostegui.json
16:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45218 and previous config saved to /var/cache/conftool/dbconfig/20230307-164010-marostegui.json
16:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45217 and previous config saved to /var/cache/conftool/dbconfig/20230307-163948-marostegui.json
16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45216 and previous config saved to /var/cache/conftool/dbconfig/20230307-163813-marostegui.json
16:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45215 and previous config saved to /var/cache/conftool/dbconfig/20230307-163140-marostegui.json
16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45214 and previous config saved to /var/cache/conftool/dbconfig/20230307-162616-marostegui.json
16:26 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45213 and previous config saved to /var/cache/conftool/dbconfig/20230307-162554-marostegui.json
16:25 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2001.codfw.wmnet with OS bullseye
16:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2071.codfw.wmnet with OS bullseye
16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45212 and previous config saved to /var/cache/conftool/dbconfig/20230307-162442-marostegui.json
16:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes2016.codfw.wmnet
16:21 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bullseye
16:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1037']
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45211 and previous config saved to /var/cache/conftool/dbconfig/20230307-161634-marostegui.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45210 and previous config saved to /var/cache/conftool/dbconfig/20230307-161132-marostegui.json
16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45209 and previous config saved to /var/cache/conftool/dbconfig/20230307-161111-marostegui.json
16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45208 and previous config saved to /var/cache/conftool/dbconfig/20230307-161047-marostegui.json
16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45207 and previous config saved to /var/cache/conftool/dbconfig/20230307-160935-marostegui.json
16:08 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
16:04 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bullseye
16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
15:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45206 and previous config saved to /var/cache/conftool/dbconfig/20230307-155604-marostegui.json
15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45205 and previous config saved to /var/cache/conftool/dbconfig/20230307-155541-marostegui.json
15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45204 and previous config saved to /var/cache/conftool/dbconfig/20230307-155428-marostegui.json
15:53 marostegui: Failover m1-master T330165
15:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
15:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
15:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
15:46 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
15:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
15:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
15:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45203 and previous config saved to /var/cache/conftool/dbconfig/20230307-154058-marostegui.json
15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45202 and previous config saved to /var/cache/conftool/dbconfig/20230307-154049-marostegui.json
15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45201 and previous config saved to /var/cache/conftool/dbconfig/20230307-154034-marostegui.json
15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
15:36 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1002.eqiad.wmnet with OS bullseye
15:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bullseye
15:29 moritzm: installing libde265 security updates
15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
15:28 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45200 and previous config saved to /var/cache/conftool/dbconfig/20230307-152729-marostegui.json
15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
15:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
15:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5001.eqsin.wmnet with OS bullseye
15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45199 and previous config saved to /var/cache/conftool/dbconfig/20230307-152545-marostegui.json
15:25 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync
15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: sync
15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
15:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: sync
15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
15:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: sync
15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: sync
15:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1039']
15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
15:20 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: sync
15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45198 and previous config saved to /var/cache/conftool/dbconfig/20230307-152037-marostegui.json
15:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1039']
15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: sync
15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
15:19 Emperor: pool thanos-fe1001 T329073
15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
15:19 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
15:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
15:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
15:16 Emperor: pool ms-fe1009 T329073
15:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
15:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
15:16 Emperor: pool moss-fe1001 T329073
15:15 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
15:15 akosiaris@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
15:15 akosiaris@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
15:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: sync
15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: sync
15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync
15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: sync
15:11 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: sync
15:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: sync
15:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1001.eqiad.wmnet with OS bullseye
15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync
15:04 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
15:04 bblack: dns1001 - restarted prometheus-bird-exporter
15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: sync
15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: sync
15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: sync
15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/SERVICE_NAME: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/SERVICE_NAME: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
15:01 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
15:01 sukhe: repooling dns1001: authdns-update can now be run again
15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: sync
14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase101[69].eqiad.wmnet
14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[18].eqiad.wmnet
14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1031.eqiad.wmnet
14:58 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: sync
14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: sync
14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync
14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: sync
14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
14:56 inflatador: bking@cumin2002 unban production row A elastic nodes from all clusters T329073
14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync
14:55 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: sync
14:54 akosiaris: T331126 toolhub deployed, https://toolhub.wikimedia.org/ operational again
14:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
14:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
14:52 inflatador: bking@cumin2002 unban row A cloudelastic nodes T329073
14:47 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T331126
14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet
14:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 238 hosts
14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 238 hosts
14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mr1-eqiad
14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for mr1-eqiad
14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:41 moritzm: enabling Puppet in eqiad/esams/drmrs after completed Switch maintenance T329073
14:40 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:40 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:36 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:29 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:20 topranks: issuing reboot to upgrade asw2-a-eqiad virtual-chassis to Junos 21.4
14:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
14:17 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
14:16 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
14:16 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
14:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1037']
14:13 akosiaris: kubectl cordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T329073
14:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2070.codfw.wmnet with OS bullseye
14:12 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
14:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
14:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1038']
14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
14:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
14:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
14:07 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
14:07 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1031.eqiad.wmnet
14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[18].eqiad.wmnet
14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase101[69].eqiad.wmnet
14:02 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
13:59 jbond: failover pki.discovery.wmnet to codfw T329073
13:58 Emperor: depool thanos-fe1001 T329073
13:55 Emperor: depool ms-fe1009 T329073
13:55 Emperor: depool moss-fe1001 T329073
13:54 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
13:50 moritzm: disabling Puppet in eqiad/esams/drmrs for forthcoming Switch maintenance T329073
13:50 topranks: staging Junos files to individual VC members eqiad row A to prep for upgrade
13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
13:04 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
13:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
12:57 sukhe: removing dns1001 from authdns_servers for T329073
12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
12:52 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
12:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
12:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
12:38 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
12:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
12:27 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
12:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1015.eqiad.wmnet with OS bullseye
12:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
12:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:17 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:15 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:14 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1016.eqiad.wmnet with OS bullseye
12:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:13 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:13 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:12 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:12 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:12 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:12 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:11 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:10 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:10 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:10 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
12:09 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:09 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:09 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:08 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:08 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:08 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:08 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:07 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:06 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
12:06 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:06 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:05 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:03 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
12:03 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
12:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:01 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:01 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:00 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
11:56 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
11:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
11:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
11:45 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:44 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:43 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
11:37 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1015.eqiad.wmnet with OS bullseye
11:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
11:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
11:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
11:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1005.eqiad.wmnet with OS bullseye
11:28 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1013.eqiad.wmnet with OS bullseye
11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
11:23 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1006.eqiad.wmnet with OS bullseye
11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
11:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
11:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
11:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45193 and previous config saved to /var/cache/conftool/dbconfig/20230307-111421-marostegui.json
11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
11:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
11:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
11:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
11:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1016.eqiad.wmnet with OS bullseye
11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45192 and previous config saved to /var/cache/conftool/dbconfig/20230307-105914-marostegui.json
10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
10:58 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
10:57 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
10:56 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
10:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1006.eqiad.wmnet with OS bullseye
10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1005.eqiad.wmnet with OS bullseye
10:53 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
10:51 akosiaris: manually label kubemaster1001, kubemaster1002 giving them role master T307943
10:48 arturo: apt2001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45191 and previous config saved to /var/cache/conftool/dbconfig/20230307-104408-marostegui.json
10:39 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1001.eqiad.wmnet with OS bullseye
10:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1002.eqiad.wmnet with OS bullseye
10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45190 and previous config saved to /var/cache/conftool/dbconfig/20230307-102901-marostegui.json
10:28 arturo: apt1001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
10:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
10:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45189 and previous config saved to /var/cache/conftool/dbconfig/20230307-100807-marostegui.json
10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45188 and previous config saved to /var/cache/conftool/dbconfig/20230307-100745-marostegui.json
10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1002.eqiad.wmnet with OS bullseye
10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1001.eqiad.wmnet with OS bullseye
10:05 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1005.eqiad.wmnet with OS bullseye
09:54 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1006.eqiad.wmnet with OS bullseye
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45187 and previous config saved to /var/cache/conftool/dbconfig/20230307-095239-marostegui.json
09:39 akosiaris: schedule downtime for PyBal backends health on lvs1019, lvs1020
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45186 and previous config saved to /var/cache/conftool/dbconfig/20230307-093732-marostegui.json
09:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1004.eqiad.wmnet with OS bullseye
09:33 moritzm: installing apr-util security updates on Bullseye
09:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45184 and previous config saved to /var/cache/conftool/dbconfig/20230307-092226-marostegui.json
09:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
09:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
09:14 moritzm: installing PHP 7.4 security updates (as packaged in Debian Bullseye, not our internal build for Buster)
09:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1006.eqiad.wmnet with OS bullseye
09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1005.eqiad.wmnet with OS bullseye
09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1004.eqiad.wmnet with OS bullseye
09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45182 and previous config saved to /var/cache/conftool/dbconfig/20230307-090130-marostegui.json
09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45181 and previous config saved to /var/cache/conftool/dbconfig/20230307-090109-marostegui.json
08:51 akosiaris: T331126 Scheduled 24H downtime for all wikikube eqiad hosts and all LVS services powered by the cluster
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45180 and previous config saved to /var/cache/conftool/dbconfig/20230307-084602-marostegui.json
08:43 dcausse: closing the UTC morning backport window
08:42 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1003.eqiad.wmnet with OS bullseye
08:41 dcausse@deploy2002: Finished scap: Backport for Properly pass the page id on page moves (T331127) (duration: 16m 34s)
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1101 from dbctl T329352', diff saved to https://phabricator.wikimedia.org/P45179 and previous config saved to /var/cache/conftool/dbconfig/20230307-083542-marostegui.json
08:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
08:33 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45178 and previous config saved to /var/cache/conftool/dbconfig/20230307-083056-marostegui.json
08:28 dcausse@deploy2002: dcausse: Backport for Properly pass the page id on page moves (T331127) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:24 dcausse@deploy2002: Started scap: Backport for Properly pass the page id on page moves (T331127)
08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:22 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
08:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
08:20 marostegui: Failover m3 from db1159 to db1101 - T331384
08:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
08:19 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
08:18 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45177 and previous config saved to /var/cache/conftool/dbconfig/20230307-081549-marostegui.json
08:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:14 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
08:14 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
08:09 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
08:07 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1003.eqiad.wmnet with OS bullseye
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45176 and previous config saved to /var/cache/conftool/dbconfig/20230307-075453-marostegui.json
07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45175 and previous config saved to /var/cache/conftool/dbconfig/20230307-075443-marostegui.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45174 and previous config saved to /var/cache/conftool/dbconfig/20230307-073936-marostegui.json
07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
07:34 vgutierrez: enable haproxy systemd service unit hardening in cp4044 - T323944
07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) T331381', diff saved to https://phabricator.wikimedia.org/P45172 and previous config saved to /var/cache/conftool/dbconfig/20230307-072454-root.json
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45171 and previous config saved to /var/cache/conftool/dbconfig/20230307-072429-marostegui.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45170 and previous config saved to /var/cache/conftool/dbconfig/20230307-070923-marostegui.json
06:54 marostegui: dbmaint eqiad s1 T329203
06:53 marostegui: dbmaint eqiad s4 T329203
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45169 and previous config saved to /var/cache/conftool/dbconfig/20230307-064752-marostegui.json
06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45168 and previous config saved to /var/cache/conftool/dbconfig/20230307-064730-marostegui.json
06:43 marostegui: dbmaint eqiad s4 T328817
06:43 marostegui: dbmaint eqiad s1 T328817
06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2095.codfw.wmnet
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:34 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45167 and previous config saved to /var/cache/conftool/dbconfig/20230307-063223-marostegui.json
06:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2095.codfw.wmnet
06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45166 and previous config saved to /var/cache/conftool/dbconfig/20230307-061717-marostegui.json
06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45165 and previous config saved to /var/cache/conftool/dbconfig/20230307-060210-marostegui.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45164 and previous config saved to /var/cache/conftool/dbconfig/20230307-054153-marostegui.json
05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45163 and previous config saved to /var/cache/conftool/dbconfig/20230307-054127-marostegui.json
05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45162 and previous config saved to /var/cache/conftool/dbconfig/20230307-052620-marostegui.json
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45161 and previous config saved to /var/cache/conftool/dbconfig/20230307-051113-marostegui.json
04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45160 and previous config saved to /var/cache/conftool/dbconfig/20230307-045607-marostegui.json
03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45159 and previous config saved to /var/cache/conftool/dbconfig/20230307-035541-marostegui.json
03:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
03:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45158 and previous config saved to /var/cache/conftool/dbconfig/20230307-035520-marostegui.json
03:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45157 and previous config saved to /var/cache/conftool/dbconfig/20230307-034013-marostegui.json
03:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45156 and previous config saved to /var/cache/conftool/dbconfig/20230307-032506-marostegui.json
03:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45155 and previous config saved to /var/cache/conftool/dbconfig/20230307-031000-marostegui.json
02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45154 and previous config saved to /var/cache/conftool/dbconfig/20230307-024912-marostegui.json
02:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
02:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
02:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45153 and previous config saved to /var/cache/conftool/dbconfig/20230307-024850-marostegui.json
02:34 eileen: civicrm upgraded from fe2c06f6 to dbe3b716
02:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45152 and previous config saved to /var/cache/conftool/dbconfig/20230307-023344-marostegui.json
02:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45151 and previous config saved to /var/cache/conftool/dbconfig/20230307-021837-marostegui.json
02:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45150 and previous config saved to /var/cache/conftool/dbconfig/20230307-020330-marostegui.json
01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45149 and previous config saved to /var/cache/conftool/dbconfig/20230307-014152-marostegui.json
01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45148 and previous config saved to /var/cache/conftool/dbconfig/20230307-014130-marostegui.json
01:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45147 and previous config saved to /var/cache/conftool/dbconfig/20230307-012624-marostegui.json
01:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45146 and previous config saved to /var/cache/conftool/dbconfig/20230307-011117-marostegui.json
00:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45145 and previous config saved to /var/cache/conftool/dbconfig/20230307-005611-marostegui.json
00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45144 and previous config saved to /var/cache/conftool/dbconfig/20230307-003547-marostegui.json
00:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45143 and previous config saved to /var/cache/conftool/dbconfig/20230307-003525-marostegui.json
00:23 mutante: people* - determined which users did not have a public_html dir in codfw but did in eqiad. created that dir, rsynced via push from people1003 to people2002 for the 7 affected users. re-enabled temp disabled puppet to restore live-hacked rsync config. T330091
00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45142 and previous config saved to /var/cache/conftool/dbconfig/20230307-002019-marostegui.json
00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45141 and previous config saved to /var/cache/conftool/dbconfig/20230307-000512-marostegui.json

2023-03-06

23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45140 and previous config saved to /var/cache/conftool/dbconfig/20230306-235006-marostegui.json
23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45139 and previous config saved to /var/cache/conftool/dbconfig/20230306-232933-marostegui.json
23:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
23:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
23:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
23:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
23:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
23:16 inflatador: bking@cumin2002 ban row A cloudelastic hosts T329073
23:11 mforns@deploy2002: Finished deploy [airflow-dags/analytics@53a0280]: (no justification provided) (duration: 00m 17s)
23:11 mforns@deploy2002: Started deploy [airflow-dags/analytics@53a0280]: (no justification provided)
23:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
23:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
23:05 ryankemper: T329073 Pre-emptively depooled internal wdqs hosts `wdqs10[03,11]`
23:04 inflatador: bking@cumin2002 'depool wcqs and wdqs row A hosts T329073'
22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
22:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
22:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45138 and previous config saved to /var/cache/conftool/dbconfig/20230306-223044-marostegui.json
22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45137 and previous config saved to /var/cache/conftool/dbconfig/20230306-221537-marostegui.json
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45136 and previous config saved to /var/cache/conftool/dbconfig/20230306-220031-marostegui.json
21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45135 and previous config saved to /var/cache/conftool/dbconfig/20230306-214524-marostegui.json
21:45 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45133 and previous config saved to /var/cache/conftool/dbconfig/20230306-212358-marostegui.json
21:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
21:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
21:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45132 and previous config saved to /var/cache/conftool/dbconfig/20230306-212336-marostegui.json
21:19 zabe@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) (duration: 16m 59s)
21:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45131 and previous config saved to /var/cache/conftool/dbconfig/20230306-210829-marostegui.json
21:04 zabe@deploy2002: zabe and sbailey: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:02 zabe@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612)
20:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45130 and previous config saved to /var/cache/conftool/dbconfig/20230306-205322-marostegui.json
20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45129 and previous config saved to /var/cache/conftool/dbconfig/20230306-203816-marostegui.json
20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45128 and previous config saved to /var/cache/conftool/dbconfig/20230306-201704-marostegui.json
20:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
20:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45127 and previous config saved to /var/cache/conftool/dbconfig/20230306-201643-marostegui.json
20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45126 and previous config saved to /var/cache/conftool/dbconfig/20230306-200843-marostegui.json
20:04 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45125 and previous config saved to /var/cache/conftool/dbconfig/20230306-200354-marostegui.json
20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45124 and previous config saved to /var/cache/conftool/dbconfig/20230306-200136-marostegui.json
19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45123 and previous config saved to /var/cache/conftool/dbconfig/20230306-195336-marostegui.json
19:51 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
19:49 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
19:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45122 and previous config saved to /var/cache/conftool/dbconfig/20230306-194848-marostegui.json
19:48 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
19:47 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45121 and previous config saved to /var/cache/conftool/dbconfig/20230306-194630-marostegui.json
19:45 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
19:44 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45120 and previous config saved to /var/cache/conftool/dbconfig/20230306-193829-marostegui.json
19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45119 and previous config saved to /var/cache/conftool/dbconfig/20230306-193341-marostegui.json
19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45118 and previous config saved to /var/cache/conftool/dbconfig/20230306-193123-marostegui.json
19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45117 and previous config saved to /var/cache/conftool/dbconfig/20230306-192322-marostegui.json
19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45116 and previous config saved to /var/cache/conftool/dbconfig/20230306-191835-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45115 and previous config saved to /var/cache/conftool/dbconfig/20230306-191622-marostegui.json
19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45114 and previous config saved to /var/cache/conftool/dbconfig/20230306-191600-marostegui.json
19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45113 and previous config saved to /var/cache/conftool/dbconfig/20230306-190943-marostegui.json
19:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45112 and previous config saved to /var/cache/conftool/dbconfig/20230306-190921-marostegui.json
19:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45111 and previous config saved to /var/cache/conftool/dbconfig/20230306-190054-marostegui.json
18:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1036']
18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45110 and previous config saved to /var/cache/conftool/dbconfig/20230306-185559-marostegui.json
18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45109 and previous config saved to /var/cache/conftool/dbconfig/20230306-185537-marostegui.json
18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45108 and previous config saved to /var/cache/conftool/dbconfig/20230306-185415-marostegui.json
18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45107 and previous config saved to /var/cache/conftool/dbconfig/20230306-184547-marostegui.json
18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45106 and previous config saved to /var/cache/conftool/dbconfig/20230306-184030-marostegui.json
18:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45105 and previous config saved to /var/cache/conftool/dbconfig/20230306-183908-marostegui.json
18:38 mutante: phabricator - locked and archived project acl*discovery-repository-admins (T324171)
18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
18:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45104 and previous config saved to /var/cache/conftool/dbconfig/20230306-183040-marostegui.json
18:25 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1036']
18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45103 and previous config saved to /var/cache/conftool/dbconfig/20230306-182524-marostegui.json
18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45102 and previous config saved to /var/cache/conftool/dbconfig/20230306-182508-marostegui.json
18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45101 and previous config saved to /var/cache/conftool/dbconfig/20230306-182447-marostegui.json
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45100 and previous config saved to /var/cache/conftool/dbconfig/20230306-182402-marostegui.json
18:23 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
18:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
18:21 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
18:12 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45099 and previous config saved to /var/cache/conftool/dbconfig/20230306-181017-marostegui.json
18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45098 and previous config saved to /var/cache/conftool/dbconfig/20230306-180940-marostegui.json
18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45097 and previous config saved to /var/cache/conftool/dbconfig/20230306-180249-marostegui.json
18:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
18:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45096 and previous config saved to /var/cache/conftool/dbconfig/20230306-180228-marostegui.json
17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45095 and previous config saved to /var/cache/conftool/dbconfig/20230306-175433-marostegui.json
17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45094 and previous config saved to /var/cache/conftool/dbconfig/20230306-175254-marostegui.json
17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45093 and previous config saved to /var/cache/conftool/dbconfig/20230306-175218-marostegui.json
17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45092 and previous config saved to /var/cache/conftool/dbconfig/20230306-174721-marostegui.json
17:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45091 and previous config saved to /var/cache/conftool/dbconfig/20230306-173927-marostegui.json
17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45090 and previous config saved to /var/cache/conftool/dbconfig/20230306-173711-marostegui.json
17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45089 and previous config saved to /var/cache/conftool/dbconfig/20230306-173350-marostegui.json
17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
17:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45088 and previous config saved to /var/cache/conftool/dbconfig/20230306-173328-marostegui.json
17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45087 and previous config saved to /var/cache/conftool/dbconfig/20230306-173215-marostegui.json
17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45086 and previous config saved to /var/cache/conftool/dbconfig/20230306-172205-marostegui.json
17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45085 and previous config saved to /var/cache/conftool/dbconfig/20230306-171821-marostegui.json
17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45084 and previous config saved to /var/cache/conftool/dbconfig/20230306-171708-marostegui.json
17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45083 and previous config saved to /var/cache/conftool/dbconfig/20230306-170657-marostegui.json
17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45082 and previous config saved to /var/cache/conftool/dbconfig/20230306-170315-marostegui.json
16:54 andrew@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759 (duration: 05m 19s)
16:49 andrew@deploy2002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759
16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45081 and previous config saved to /var/cache/conftool/dbconfig/20230306-164808-marostegui.json
16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45080 and previous config saved to /var/cache/conftool/dbconfig/20230306-164245-marostegui.json
16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
16:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-codfw
16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45079 and previous config saved to /var/cache/conftool/dbconfig/20230306-164158-marostegui.json
16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45078 and previous config saved to /var/cache/conftool/dbconfig/20230306-163806-marostegui.json
16:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
16:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
16:32 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-codfw
16:29 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45077 and previous config saved to /var/cache/conftool/dbconfig/20230306-162651-marostegui.json
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45076 and previous config saved to /var/cache/conftool/dbconfig/20230306-161652-marostegui.json
16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45075 and previous config saved to /var/cache/conftool/dbconfig/20230306-161631-marostegui.json
16:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45074 and previous config saved to /var/cache/conftool/dbconfig/20230306-161321-marostegui.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45073 and previous config saved to /var/cache/conftool/dbconfig/20230306-161144-marostegui.json
16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2014.codfw.wmnet
16:05 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2014.codfw.wmnet
16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2014.codfw.wmnet
16:04 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2014.codfw.wmnet
16:03 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2013.codfw.wmnet
16:02 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2013.codfw.wmnet
16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2013.codfw.wmnet
16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45072 and previous config saved to /var/cache/conftool/dbconfig/20230306-160124-marostegui.json
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45071 and previous config saved to /var/cache/conftool/dbconfig/20230306-155815-marostegui.json
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45070 and previous config saved to /var/cache/conftool/dbconfig/20230306-155638-marostegui.json
15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45069 and previous config saved to /var/cache/conftool/dbconfig/20230306-155428-marostegui.json
15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
15:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45068 and previous config saved to /var/cache/conftool/dbconfig/20230306-155030-marostegui.json
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45067 and previous config saved to /var/cache/conftool/dbconfig/20230306-154618-marostegui.json
15:45 otto@deploy2002: Finished deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided) (duration: 01m 25s)
15:44 otto@deploy2002: Started deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided)
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45066 and previous config saved to /var/cache/conftool/dbconfig/20230306-154308-marostegui.json
15:40 otto@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided) (duration: 01m 27s)
15:39 otto@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided)
15:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2014.codfw.wmnet
15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45065 and previous config saved to /var/cache/conftool/dbconfig/20230306-153524-marostegui.json
15:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45064 and previous config saved to /var/cache/conftool/dbconfig/20230306-153111-marostegui.json
15:30 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
15:30 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2014.codfw.wmnet
15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45063 and previous config saved to /var/cache/conftool/dbconfig/20230306-152801-marostegui.json
15:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
15:26 mforns@deploy2002: Finished deploy [airflow-dags/analytics@2fa7484]: (no justification provided) (duration: 00m 17s)
15:25 mforns@deploy2002: Started deploy [airflow-dags/analytics@2fa7484]: (no justification provided)
15:25 volans@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:23 zabe@deploy2002: Finished scap: Backport for Add logo for azwikimedia and vewikimedia (T331177) (duration: 08m 33s)
15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45062 and previous config saved to /var/cache/conftool/dbconfig/20230306-152017-marostegui.json
15:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
15:16 zabe@deploy2002: zabe: Backport for Add logo for azwikimedia and vewikimedia (T331177) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:14 zabe@deploy2002: Started scap: Backport for Add logo for azwikimedia and vewikimedia (T331177)
15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45061 and previous config saved to /var/cache/conftool/dbconfig/20230306-150956-marostegui.json
15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:08 Lucas_WMDE: UTC afternoon backport+config window done
15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
15:05 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45060 and previous config saved to /var/cache/conftool/dbconfig/20230306-150510-marostegui.json
15:04 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
15:02 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45059 and previous config saved to /var/cache/conftool/dbconfig/20230306-150115-marostegui.json
15:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45058 and previous config saved to /var/cache/conftool/dbconfig/20230306-150054-marostegui.json
14:59 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45057 and previous config saved to /var/cache/conftool/dbconfig/20230306-145945-marostegui.json
14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45056 and previous config saved to /var/cache/conftool/dbconfig/20230306-145924-marostegui.json
14:57 herron: failing grafana over to codfw T329073
14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45055 and previous config saved to /var/cache/conftool/dbconfig/20230306-145052-marostegui.json
14:50 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
14:49 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45054 and previous config saved to /var/cache/conftool/dbconfig/20230306-144547-marostegui.json
14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45053 and previous config saved to /var/cache/conftool/dbconfig/20230306-144417-marostegui.json
14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45051 and previous config saved to /var/cache/conftool/dbconfig/20230306-143546-marostegui.json
14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45050 and previous config saved to /var/cache/conftool/dbconfig/20230306-143041-marostegui.json
14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45049 and previous config saved to /var/cache/conftool/dbconfig/20230306-142910-marostegui.json
14:25 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45048 and previous config saved to /var/cache/conftool/dbconfig/20230306-142039-marostegui.json
14:16 sukhe: running authdns-update for CR 894652
14:15 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45047 and previous config saved to /var/cache/conftool/dbconfig/20230306-141534-marostegui.json
14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45046 and previous config saved to /var/cache/conftool/dbconfig/20230306-141404-marostegui.json
14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45045 and previous config saved to /var/cache/conftool/dbconfig/20230306-140533-marostegui.json
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45044 and previous config saved to /var/cache/conftool/dbconfig/20230306-140339-marostegui.json
14:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
14:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45043 and previous config saved to /var/cache/conftool/dbconfig/20230306-140317-marostegui.json
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45042 and previous config saved to /var/cache/conftool/dbconfig/20230306-134820-marostegui.json
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45041 and previous config saved to /var/cache/conftool/dbconfig/20230306-134811-marostegui.json
13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1001.eqiad.wmnet,service=thanos-web
13:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
13:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45040 and previous config saved to /var/cache/conftool/dbconfig/20230306-133451-marostegui.json
13:34 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-canary
13:34 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45039 and previous config saved to /var/cache/conftool/dbconfig/20230306-133304-marostegui.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45038 and previous config saved to /var/cache/conftool/dbconfig/20230306-131945-marostegui.json
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45037 and previous config saved to /var/cache/conftool/dbconfig/20230306-131758-marostegui.json
13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45036 and previous config saved to /var/cache/conftool/dbconfig/20230306-131545-marostegui.json
13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45035 and previous config saved to /var/cache/conftool/dbconfig/20230306-131214-marostegui.json
13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45034 and previous config saved to /var/cache/conftool/dbconfig/20230306-130933-marostegui.json
13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
13:09 moritzm: rearmed keyholder on deploy1002 following reboot
13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45033 and previous config saved to /var/cache/conftool/dbconfig/20230306-130854-marostegui.json
13:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1002.eqiad.wmnet with OS bullseye
13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45032 and previous config saved to /var/cache/conftool/dbconfig/20230306-130438-marostegui.json
12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45031 and previous config saved to /var/cache/conftool/dbconfig/20230306-125707-marostegui.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45030 and previous config saved to /var/cache/conftool/dbconfig/20230306-125348-marostegui.json
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45029 and previous config saved to /var/cache/conftool/dbconfig/20230306-124932-marostegui.json
12:48 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
12:46 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45028 and previous config saved to /var/cache/conftool/dbconfig/20230306-124341-marostegui.json
12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45027 and previous config saved to /var/cache/conftool/dbconfig/20230306-124308-marostegui.json
12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45026 and previous config saved to /var/cache/conftool/dbconfig/20230306-124200-marostegui.json
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45025 and previous config saved to /var/cache/conftool/dbconfig/20230306-123841-marostegui.json
12:32 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1002.eqiad.wmnet with OS bullseye
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45024 and previous config saved to /var/cache/conftool/dbconfig/20230306-122801-marostegui.json
12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45023 and previous config saved to /var/cache/conftool/dbconfig/20230306-122654-marostegui.json
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45022 and previous config saved to /var/cache/conftool/dbconfig/20230306-122546-marostegui.json
12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45021 and previous config saved to /var/cache/conftool/dbconfig/20230306-122524-marostegui.json
12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45020 and previous config saved to /var/cache/conftool/dbconfig/20230306-122334-marostegui.json
12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45019 and previous config saved to /var/cache/conftool/dbconfig/20230306-121255-marostegui.json
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45018 and previous config saved to /var/cache/conftool/dbconfig/20230306-121018-marostegui.json
12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45017 and previous config saved to /var/cache/conftool/dbconfig/20230306-120328-marostegui.json
12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45016 and previous config saved to /var/cache/conftool/dbconfig/20230306-115748-marostegui.json
11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45015 and previous config saved to /var/cache/conftool/dbconfig/20230306-115511-marostegui.json
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45014 and previous config saved to /var/cache/conftool/dbconfig/20230306-115201-marostegui.json
11:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45013 and previous config saved to /var/cache/conftool/dbconfig/20230306-115140-marostegui.json
11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P45012 and previous config saved to /var/cache/conftool/dbconfig/20230306-114354-marostegui.json
11:42 vgutierrez: enable ESI testing in cp4044 - T308799
11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45011 and previous config saved to /var/cache/conftool/dbconfig/20230306-114004-marostegui.json
11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45010 and previous config saved to /var/cache/conftool/dbconfig/20230306-113856-marostegui.json
11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45009 and previous config saved to /var/cache/conftool/dbconfig/20230306-113835-marostegui.json
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45008 and previous config saved to /var/cache/conftool/dbconfig/20230306-113633-marostegui.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45007 and previous config saved to /var/cache/conftool/dbconfig/20230306-112847-marostegui.json
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45006 and previous config saved to /var/cache/conftool/dbconfig/20230306-112328-marostegui.json
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45005 and previous config saved to /var/cache/conftool/dbconfig/20230306-112126-marostegui.json
11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45003 and previous config saved to /var/cache/conftool/dbconfig/20230306-111340-marostegui.json
11:09 jbond: enable puppet fleet wide to post reboot puppetdb
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45002 and previous config saved to /var/cache/conftool/dbconfig/20230306-110822-marostegui.json
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45001 and previous config saved to /var/cache/conftool/dbconfig/20230306-110620-marostegui.json
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45000 and previous config saved to /var/cache/conftool/dbconfig/20230306-110031-marostegui.json
11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44999 and previous config saved to /var/cache/conftool/dbconfig/20230306-110009-marostegui.json
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44998 and previous config saved to /var/cache/conftool/dbconfig/20230306-105834-marostegui.json
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44997 and previous config saved to /var/cache/conftool/dbconfig/20230306-105315-marostegui.json
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44996 and previous config saved to /var/cache/conftool/dbconfig/20230306-105206-marostegui.json
10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44995 and previous config saved to /var/cache/conftool/dbconfig/20230306-105145-marostegui.json
10:49 jbond: disable puppet fleet wide to reboot puppetdb
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44994 and previous config saved to /var/cache/conftool/dbconfig/20230306-104503-marostegui.json
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44993 and previous config saved to /var/cache/conftool/dbconfig/20230306-103639-marostegui.json
10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44992 and previous config saved to /var/cache/conftool/dbconfig/20230306-103525-marostegui.json
10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44991 and previous config saved to /var/cache/conftool/dbconfig/20230306-103503-marostegui.json
10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44990 and previous config saved to /var/cache/conftool/dbconfig/20230306-102956-marostegui.json
10:29 vgutierrez: enable haproxy systemd service unit hardening in cp4045 - T323944
10:29 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1001.eqiad.wmnet with OS bullseye
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44989 and previous config saved to /var/cache/conftool/dbconfig/20230306-102132-marostegui.json
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44988 and previous config saved to /var/cache/conftool/dbconfig/20230306-101957-marostegui.json
10:18 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:17 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44987 and previous config saved to /var/cache/conftool/dbconfig/20230306-101450-marostegui.json
10:12 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
10:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:12 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44986 and previous config saved to /var/cache/conftool/dbconfig/20230306-100901-marostegui.json
10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44985 and previous config saved to /var/cache/conftool/dbconfig/20230306-100840-marostegui.json
10:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
10:07 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44984 and previous config saved to /var/cache/conftool/dbconfig/20230306-100626-marostegui.json
10:05 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44983 and previous config saved to /var/cache/conftool/dbconfig/20230306-100450-marostegui.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44982 and previous config saved to /var/cache/conftool/dbconfig/20230306-100417-marostegui.json
10:04 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44981 and previous config saved to /var/cache/conftool/dbconfig/20230306-100356-marostegui.json
09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host deploy1002.eqiad.wmnet
09:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44980 and previous config saved to /var/cache/conftool/dbconfig/20230306-095333-marostegui.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44979 and previous config saved to /var/cache/conftool/dbconfig/20230306-094944-marostegui.json
09:49 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
09:49 nfraison@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-conf1001.eqiad.wmnet with OS bullseye
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44978 and previous config saved to /var/cache/conftool/dbconfig/20230306-094849-marostegui.json
09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44977 and previous config saved to /var/cache/conftool/dbconfig/20230306-094341-root.json
09:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44976 and previous config saved to /var/cache/conftool/dbconfig/20230306-093827-marostegui.json
09:36 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44975 and previous config saved to /var/cache/conftool/dbconfig/20230306-093343-marostegui.json
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44974 and previous config saved to /var/cache/conftool/dbconfig/20230306-092836-root.json
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44973 and previous config saved to /var/cache/conftool/dbconfig/20230306-092557-marostegui.json
09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
09:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44972 and previous config saved to /var/cache/conftool/dbconfig/20230306-092536-marostegui.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44971 and previous config saved to /var/cache/conftool/dbconfig/20230306-092320-marostegui.json
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44970 and previous config saved to /var/cache/conftool/dbconfig/20230306-091836-marostegui.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44969 and previous config saved to /var/cache/conftool/dbconfig/20230306-091733-marostegui.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44968 and previous config saved to /var/cache/conftool/dbconfig/20230306-091728-marostegui.json
09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44967 and previous config saved to /var/cache/conftool/dbconfig/20230306-091706-marostegui.json
09:14 dcausse: depooling & restarting blazegraph on wdqs1006 (stuck for 48+ hours)
09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44966 and previous config saved to /var/cache/conftool/dbconfig/20230306-091330-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44965 and previous config saved to /var/cache/conftool/dbconfig/20230306-091030-marostegui.json
09:06 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001 (duration: 00m 12s)
09:06 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001
09:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44964 and previous config saved to /var/cache/conftool/dbconfig/20230306-090416-marostegui.json
09:02 vgutierrez: disabling haproxy systemd service unit hardening in ulsfo - T323944
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44963 and previous config saved to /var/cache/conftool/dbconfig/20230306-090200-marostegui.json
09:00 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002 (duration: 00m 07s)
09:00 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44962 and previous config saved to /var/cache/conftool/dbconfig/20230306-085825-root.json
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44961 and previous config saved to /var/cache/conftool/dbconfig/20230306-085523-marostegui.json
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44960 and previous config saved to /var/cache/conftool/dbconfig/20230306-084910-marostegui.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44959 and previous config saved to /var/cache/conftool/dbconfig/20230306-084653-marostegui.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44958 and previous config saved to /var/cache/conftool/dbconfig/20230306-084320-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44957 and previous config saved to /var/cache/conftool/dbconfig/20230306-084017-marostegui.json
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44956 and previous config saved to /var/cache/conftool/dbconfig/20230306-083403-marostegui.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44955 and previous config saved to /var/cache/conftool/dbconfig/20230306-083147-marostegui.json
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44954 and previous config saved to /var/cache/conftool/dbconfig/20230306-083038-marostegui.json
08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
08:28 moritzm: rolling restart of Apache on mw* to pick up apr-util security updates
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44953 and previous config saved to /var/cache/conftool/dbconfig/20230306-082815-root.json
08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44952 and previous config saved to /var/cache/conftool/dbconfig/20230306-082645-marostegui.json
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
08:22 kartik@deploy2002: Finished scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) (duration: 19m 12s)
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44951 and previous config saved to /var/cache/conftool/dbconfig/20230306-081857-marostegui.json
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44950 and previous config saved to /var/cache/conftool/dbconfig/20230306-081711-marostegui.json
08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44949 and previous config saved to /var/cache/conftool/dbconfig/20230306-081639-marostegui.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44948 and previous config saved to /var/cache/conftool/dbconfig/20230306-081310-root.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44947 and previous config saved to /var/cache/conftool/dbconfig/20230306-081305-marostegui.json
08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44946 and previous config saved to /var/cache/conftool/dbconfig/20230306-081244-marostegui.json
08:12 kartik@deploy2002: kartik: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44945 and previous config saved to /var/cache/conftool/dbconfig/20230306-081138-marostegui.json
08:02 kartik@deploy2002: Started scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482)
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44944 and previous config saved to /var/cache/conftool/dbconfig/20230306-080132-marostegui.json
08:00 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44943 and previous config saved to /var/cache/conftool/dbconfig/20230306-075737-marostegui.json
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44942 and previous config saved to /var/cache/conftool/dbconfig/20230306-075632-marostegui.json
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122', diff saved to https://phabricator.wikimedia.org/P44941 and previous config saved to /var/cache/conftool/dbconfig/20230306-074830-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44940 and previous config saved to /var/cache/conftool/dbconfig/20230306-074626-marostegui.json
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44939 and previous config saved to /var/cache/conftool/dbconfig/20230306-074231-marostegui.json
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44938 and previous config saved to /var/cache/conftool/dbconfig/20230306-074125-marostegui.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44937 and previous config saved to /var/cache/conftool/dbconfig/20230306-073707-marostegui.json
07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44936 and previous config saved to /var/cache/conftool/dbconfig/20230306-073119-marostegui.json
07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44935 and previous config saved to /var/cache/conftool/dbconfig/20230306-072724-marostegui.json
07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2094.codfw.wmnet
07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:22 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44934 and previous config saved to /var/cache/conftool/dbconfig/20230306-072132-marostegui.json
07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2094.codfw.wmnet
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44933 and previous config saved to /var/cache/conftool/dbconfig/20230306-070814-marostegui.json
07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
06:29 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/public to dumpsdata1007, no bandwidth cap
06:03 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/private to dumpsdata1007 (did this for 1006 about an hour ago, forgot to log), no bandwidth cap

2023-03-04

14:56 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 02m 17s)
14:53 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
14:44 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 08m 56s)
14:35 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
14:32 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 00m 46s)
14:31 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: (no justification provided)
06:09 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1006, no bandwidth cap

2023-03-03

20:58 inflatador: bking@cumin2002 persistently unban all elastic nodes in eqiad T322082
20:55 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
20:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2070.codfw.wmnet with OS bullseye
20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
20:33 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
20:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
20:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
20:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
20:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
20:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
20:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
20:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
20:05 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
19:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
19:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
19:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
19:40 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
19:39 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
19:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
19:36 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
19:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
19:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
19:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
19:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
18:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
18:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
18:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2070.codfw.wmnet with OS bullseye
18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
18:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
18:17 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
18:16 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:47 krinkle@deploy2002: Synchronized wmf-config/mc.php: Ic55725: Prepare mc.php for next week train (duration: 07m 39s)
17:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
17:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
17:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
17:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
17:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
17:01 inflatador: bking@cumin2002 ban elastic1059-1066 T322082
16:56 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
16:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
16:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
16:44 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
16:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
16:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
16:38 bking@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
16:37 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
16:36 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
16:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
16:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
16:09 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
15:53 mforns@deploy2002: Finished deploy [airflow-dags/analytics@ad17aa9]: (no justification provided) (duration: 00m 22s)
15:53 mforns@deploy2002: Started deploy [airflow-dags/analytics@ad17aa9]: (no justification provided)
15:47 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
15:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:43 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance (duration: 00m 21s)
15:42 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance
15:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:39 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
15:36 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:33 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:32 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:32 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:26 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
15:24 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:24 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
15:21 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
15:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
15:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1004.wikimedia.org
15:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1004.wikimedia.org on all recursors
15:02 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1004.wikimedia.org on all recursors
15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
14:59 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
14:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
14:56 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:56 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1004.wikimedia.org
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1003.wikimedia.org
14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1003.wikimedia.org on all recursors
14:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1003.wikimedia.org on all recursors
14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: rerack
14:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: rerack
14:24 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1003.wikimedia.org
14:09 inflatador: bking@cumin2002 banning elastic1053-59 from the cluster in preparation for T322082
14:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
13:51 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
13:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
13:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
13:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
13:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
12:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
11:29 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
11:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
11:13 moritzm: imported PHP 7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u1 to component/icu67 (build of PHP against co-installable ICU67) T329491
10:39 vgutierrez: restart ntp.service in dns2001
10:30 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
10:25 moritzm: installing 5.10.162 kernels on buster systems running Linux 5.10
10:12 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
10:12 root@cumin2002: START - Cookbook sre.idm.logout Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
09:56 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 1119 hosts
09:55 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 1119 hosts
09:54 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 909 hosts
09:54 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 909 hosts
09:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
09:45 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
09:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
09:10 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:10 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:07 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:01 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
08:54 elukey: restart pybal on lvs2010 (standby) and then on lvs2009 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
08:48 elukey: restart pybal on lvs1020 (standby) and then on lvs1019 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
08:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
08:36 vgutierrez: restarting ntp in dns1001
07:29 elukey: truncate /var/log/auth.log.1 on krb1001 to free space (root partition almost filled up)
01:12 mutante: releases1002: deleting /usr/local/sbin/sync-srv-org-wikimedia-reprepro-releases1002.eqiad.wmnet which confusingly contains an rsync command to rsync from releases1001 which does not exist anymore T330960
00:13 mutante: switching releases.wikimedia.org from eqiad to codfw - T330960

2023-03-02

23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2001-2003].codfw.wmnet
23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
22:45 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
22:37 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
22:11 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts wdqs[2001-2003].codfw.wmnet
21:22 TheresNoTime: close UTC late backport and config training
21:10 samtar@deploy2002: Finished scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) (duration: 08m 03s)
21:04 samtar@deploy2002: superpes and samtar: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2001.wikimedia.org with OS bullseye
21:02 samtar@deploy2002: Started scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051)
21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1001.wikimedia.org with OS bullseye
20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
20:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
20:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
20:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2001.wikimedia.org with OS bullseye
20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
20:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
20:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
20:08 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:07 brett@cumin2002: START - Cookbook sre.dns.netbox
20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1001.wikimedia.org with OS bullseye
19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2014.codfw.wmnet with OS bullseye
19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
19:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
19:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2014.codfw.wmnet with OS bullseye
18:10 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
18:08 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:09 oblivian@deploy2002: Finished scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) (duration: 09m 16s)
17:01 oblivian@deploy2002: oblivian: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
16:59 oblivian@deploy2002: Started scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942)
15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
15:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
15:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:41 jynus: restart db2099 T330218
14:32 Lucas_WMDE: UTC afternoon backport+config window done
14:29 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove unused Wikibase config variables (T330410) (duration: 08m 41s)
14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Remove unused Wikibase config variables (T330410) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove unused Wikibase config variables (T330410)
13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1010.eqiad.wmnet with OS bullseye
13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
13:40 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:48 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:48 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:47 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:47 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:46 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:46 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:42 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:42 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:00 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
10:42 claime: Running authdns-update for 893675
10:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bullseye
10:16 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test (duration: 00m 12s)
10:16 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test
10:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bullseye
10:05 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
10:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:50 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
09:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
09:47 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
09:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
09:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
09:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
09:28 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1010.eqiad.wmnet with OS bullseye
09:20 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.25 refs T325588
09:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1010']
09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
09:06 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
09:04 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1010']
08:58 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
08:58 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1010
08:57 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1010
08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
08:46 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
08:39 dcaro@cumin1001: START - Cookbook sre.dns.netbox
08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bullseye
08:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
08:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
08:34 marostegui: Stop MySQL on db2093 T330827
08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
08:18 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
08:15 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
08:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
08:05 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
07:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
07:48 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
07:48 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
07:48 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:47 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:38 apergos: started rsync of xmldatadumps/private from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
07:38 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:36 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
07:17 marostegui: Stop MySQL on db2095 T330975
01:23 mutante: doc2001 - stopping apache2 to test alerting - active server is doc1002 but should be switched T327973 T330963
01:08 mutante: releases2002 - stopping apache2 to test alerting (active server is 1002 but should be switched) T327975 T330960
00:28 mutante: planet1002 - stopping apache2 to test alerting (active host is codfw)

2023-03-01

23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1002.wikimedia.org with OS bullseye
23:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
22:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
22:52 mutante: apt1001 - systemctl reset-failed T328907
22:45 mforns@deploy2002: Finished deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided) (duration: 00m 23s)
22:45 mforns@deploy2002: Started deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided)
22:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1002.wikimedia.org with OS bullseye
22:42 mforns@deploy2002: Finished deploy [airflow-dags/analytics@51e92b1]: (no justification provided) (duration: 00m 21s)
22:42 mforns@deploy2002: Started deploy [airflow-dags/analytics@51e92b1]: (no justification provided)
21:41 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a] (duration: 01m 22s)
21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a]
21:39 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a] (duration: 00m 07s)
21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a]
21:38 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a] (duration: 10m 55s)
21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2002.wikimedia.org with OS bullseye
21:27 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a]
21:23 TheresNoTime: closing UTC late backport window
21:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
21:16 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
21:11 samtar@deploy2002: Finished scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) (duration: 09m 30s)
21:04 samtar@deploy2002: superpes and samtar: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:02 samtar@deploy2002: Started scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047)
21:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2002.wikimedia.org with OS bullseye
20:43 zabe: move rev_comment_id migration screens from mwmaint1002 to mwmaint2002 # T275246
19:47 brett: re-adding dns3001 to next-hop routing via juniper - T321309
19:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3001.wikimedia.org with OS bullseye
19:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3001.wikimedia.org with OS bullseye
18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
18:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS buster
17:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
17:41 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
17:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test (duration: 00m 05s)
17:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test
17:26 root@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
17:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
17:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-serve1006.eqiad.wmnet with OS bullseye
17:06 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
17:05 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
16:56 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
16:28 brett: Remove dns3001 DNS request routing via juniper - T321309
16:28 XioNoX: rollback port 80 block in esams - T330683
16:26 taavi@deploy2002: Finished scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) (duration: 08m 23s)
16:21 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
16:20 taavi@deploy2002: taavi: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
16:19 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:19 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
16:18 taavi@deploy2002: Started scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031)
16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:15 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
16:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
16:02 bblack: cr[23]-esams: manually adding brett's ssh-rsa to match https://gerrit.wikimedia.org/r/c/operations/homer/public/+/892551
16:01 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
16:00 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
15:57 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
15:57 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
15:44 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
15:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:39 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:35 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
15:32 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
15:28 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
15:22 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
15:20 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-canary
15:18 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-canary
15:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:11 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1005']
15:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
15:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
15:06 hashar: Restarting Apache on Gerrit host
15:04 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
15:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:57 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
14:52 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1005
14:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
14:45 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1005
14:34 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
14:33 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
14:32 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-canary
14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1008.eqiad.wmnet with OS bullseye
14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
14:29 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-canary
14:27 taavi: re-start persistRevisionThreadItems.php on itwiki from P44912 after DC switchover T315510
14:27 claime: End mediawiki datacenter switchover - T327920
14:26 cgoubert@deploy2002: Finished scap: Backport for debug.json: List primary DC servers first (T327920) (duration: 07m 54s)
14:20 cgoubert@deploy2002: cgoubert: Backport for debug.json: List primary DC servers first (T327920) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:18 cgoubert@deploy2002: Started scap: Backport for debug.json: List primary DC servers first (T327920)
14:16 claime: Removing scap lock - T327920
14:15 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2122 weight', diff saved to https://phabricator.wikimedia.org/P44913 and previous config saved to /var/cache/conftool/dbconfig/20230301-141414-marostegui.json
14:10 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
14:09 claime: Phase 9.5 DNS records for new database masters updated - T327920
14:08 claime: Phase 9.5 Update DNS records for new database masters - T327920
14:07 taavi: test
14:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
14:05 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
14:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
14:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
14:02 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
14:02 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-03-01 14:02:09.272468
14:00 cgoubert@cumin1001: MediaWiki read-only period starts at: 2023-03-01 14:00:10.075167
14:00 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
13:56 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1007.eqiad.wmnet with OS bullseye
13:52 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
13:51 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
13:51 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:49 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
13:49 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
13:40 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
13:40 claime: Starting mediawiki datacenter switchover step 0 - T327920
13:37 dcaro@cumin1001: START - Cookbook sre.dns.netbox
13:31 claime: Locking scap deployments for datacenter switchover - T327920
13:30 krinkle@deploy2002: Synchronized wmf-config/: I3beefb filebackend cleanup (duration: 07m 13s)
13:19 krinkle@deploy2002: Synchronized wmf-config/: Ie063fb - Remove config for former Rdbms logging (duration: 07m 39s)
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
13:17 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
13:10 claime: Adding scheduled maintenance for switchover to statuspage - T327920
13:09 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
12:40 marostegui: Upgrade db2183 to 10.6 T330861
12:28 moritzm: upgrade mwmaint to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
11:58 moritzm: upgrade parse/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
11:09 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:08 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:07 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1010.eqiad.wmnet
11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:00 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:57 moritzm: upgrade cloudweb to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
10:56 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
10:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
10:32 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
10:30 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
10:25 dcaro@cumin1001: START - Cookbook sre.dns.netbox
10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
10:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
10:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
10:11 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
10:03 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
10:02 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1010.eqiad.wmnet
09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
09:58 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
09:57 marostegui: Stop db1117:3325 and db1176 T329478
09:57 root@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8309
09:47 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8309
09:39 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad
09:38 moritzm: installing tiff security updates
09:31 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro
09:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
09:30 jnuche@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.25 refs T325588 (duration: 07m 48s)
09:26 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
09:23 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.25 refs T325588
09:15 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
09:15 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
08:58 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
08:56 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
08:51 moritzm: upgrade mw/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
08:45 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
08:42 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1003.eqiad.wmnet with OS bullseye
08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1002.eqiad.wmnet with OS bullseye
08:40 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1001.eqiad.wmnet with OS bullseye
08:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 918 hosts
08:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 918 hosts
08:35 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 1110 hosts
08:34 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 1110 hosts
08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
08:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
08:26 jynus: stopping db2184 for testing mariadb 10.6 recovery workflow T319383
08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
08:15 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
08:14 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1001.eqiad.wmnet with OS bullseye
08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1002.eqiad.wmnet with OS bullseye
08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1003.eqiad.wmnet with OS bullseye
08:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: T330758
08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: T330758
06:14 marostegui: Stop MySQL on db2094 T330828
05:37 marostegui: Stop mysql on codfw sanitarium host db2095 (s2, s7, s6, s4) to clone db2187 T326596
05:37 eileen: civicrm upgraded from ffc16d2d to fe2c06f6
00:25 ejegg: civicrm rolled back from d199694e to ffc16d2d
00:06 zabe@deploy2002: Finished scap: T198673 (duration: 07m 25s)

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s