Server Admin Log/Archive 64

From Wikitech

2023-03-31

  • 23:55 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus5002.eqsin.wmnet with reason: host reimage
  • 23:52 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus5002.eqsin.wmnet with reason: host reimage
  • 23:21 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus5002.eqsin.wmnet with OS bullseye
  • 23:14 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus6002.drmrs.wmnet with OS bullseye
  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus4002.ulsfo.wmnet with OS bullseye
  • 23:02 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus5002.eqsin.wmnet
  • 23:02 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 23:01 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 23:01 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus6002.drmrs.wmnet with reason: host reimage
  • 22:58 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus4002.ulsfo.wmnet with reason: host reimage
  • 22:57 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus6002.drmrs.wmnet with reason: host reimage
  • 22:55 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus4002.ulsfo.wmnet with reason: host reimage
  • 22:43 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus6002.drmrs.wmnet with OS bullseye
  • 22:41 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus4002.ulsfo.wmnet with OS bullseye
  • 22:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on miscweb[2002-2003].codfw.wmnet,miscweb[1002-1003].eqiad.wmnet with reason: maintenance
  • 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on miscweb[2002-2003].codfw.wmnet,miscweb[1002-1003].eqiad.wmnet with reason: maintenance
  • 22:01 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
  • 22:01 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
  • 22:01 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 22:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
  • 22:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1132.eqiad.wmnet
  • 22:00 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 21:58 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 21:58 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
  • 21:57 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host prometheus6002.drmrs.wmnet with OS bullseye
  • 21:52 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host prometheus4002.ulsfo.wmnet with OS bullseye
  • 21:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
  • 21:12 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5002
  • 21:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:11 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 21:07 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5002
  • 21:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus5002.eqsin.wmnet
  • 21:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
  • 21:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
  • 21:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 21:04 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 21:02 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 21:02 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
  • 21:02 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
  • 21:02 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:02 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 21:00 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 20:58 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:58 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
  • 20:41 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus5002.eqsin.wmnet
  • 20:41 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
  • 20:41 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
  • 20:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 20:39 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 20:38 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus6002.drmrs.wmnet with OS bullseye
  • 20:38 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus4002.ulsfo.wmnet with OS bullseye
  • 20:38 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus6002.drmrs.wmnet
  • 20:38 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
  • 20:37 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:37 denisse@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 20:37 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus4002.ulsfo.wmnet
  • 20:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
  • 20:37 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
  • 20:33 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 20:30 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
  • 20:16 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:58 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus3002.esams.wmnet with OS bullseye
  • 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus3002.esams.wmnet with reason: host reimage
  • 19:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:42 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus3002.esams.wmnet with reason: host reimage
  • 19:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus6002.drmrs.wmnet on all recursors
  • 19:37 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus6002.drmrs.wmnet on all recursors
  • 19:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
  • 19:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus6002.drmrs.wmnet - denisse@cumin1001"
  • 19:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1152.eqiad.wmnet']
  • 19:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:34 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus6002.drmrs.wmnet
  • 19:33 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus5002.eqsin.wmnet on all recursors
  • 19:33 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus5002.eqsin.wmnet on all recursors
  • 19:33 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073.eqiad.wmnet']
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 19:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus5002.eqsin.wmnet - denisse@cumin1001"
  • 19:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073.eqiad.wmnet']
  • 19:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 19:30 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus4002.ulsfo.wmnet on all recursors
  • 19:30 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus4002.ulsfo.wmnet on all recursors
  • 19:30 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
  • 19:29 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4002.ulsfo.wmnet - denisse@cumin1001"
  • 19:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:28 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus5002.eqsin.wmnet
  • 19:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus4002.ulsfo.wmnet
  • 19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1154.eqiad.wmnet']
  • 19:24 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host prometheus3002.esams.wmnet with OS bullseye
  • 19:14 andrewbogott: upgraded wikitech-static to 1.39.3
  • 19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1153.eqiad.wmnet']
  • 19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153.eqiad.wmnet']
  • 19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154.eqiad.wmnet']
  • 18:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1155.eqiad.wmnet']
  • 18:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1156.eqiad.wmnet']
  • 18:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152.eqiad.wmnet']
  • 18:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1151.eqiad.wmnet']
  • 18:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155.eqiad.wmnet']
  • 18:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156.eqiad.wmnet']
  • 18:41 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus3002.esams.wmnet
  • 18:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 18:40 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 18:23 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@30fae0e]: (no justification provided) (duration: 00m 20s)
  • 18:23 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@30fae0e]: (no justification provided)
  • 18:22 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@30fae0e]: bump discolytics to 0.12.0 (duration: 00m 20s)
  • 18:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@30fae0e]: bump discolytics to 0.12.0
  • 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 17:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 17:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
  • 17:40 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
  • 17:40 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:39 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:36 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 17:36 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
  • 17:32 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus3002.esams.wmnet
  • 17:32 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 17:27 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3002.esams.wmnet
  • 17:23 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus3002.esams.wmnet
  • 17:23 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
  • 17:23 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
  • 17:23 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:23 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:22 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:20 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 17:20 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
  • 17:20 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
  • 17:20 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:19 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:18 aqu@deploy2002: Finished deploy [airflow-dags/analytics@9182e44]: Fix for VirtualPageview Dag - Analytics [airflow-dags@9182e44] (duration: 00m 11s)
  • 17:18 aqu@deploy2002: Started deploy [airflow-dags/analytics@9182e44]: Fix for VirtualPageview Dag - Analytics [airflow-dags@9182e44]
  • 17:17 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 17:17 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
  • 17:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@48778b4]: bump discolytics to 0.11.0 (duration: 00m 19s)
  • 17:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@48778b4]: bump discolytics to 0.11.0
  • 17:16 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus3002.esams.wmnet
  • 17:16 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
  • 17:16 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
  • 17:16 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:16 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:15 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 17:13 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 17:13 denisse@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 16:55 sukhe: restart pybal on lvs4008 to set it primary LVS for high-traffic1
  • 16:54 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2aae7d0]: Fix for VirtualPageview Dag - Analytics [airflow-dags@2aae7d0] (duration: 00m 10s)
  • 16:54 aqu@deploy2002: Started deploy [airflow-dags/analytics@2aae7d0]: Fix for VirtualPageview Dag - Analytics [airflow-dags@2aae7d0]
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:15 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:15 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 16:10 ladsgroup@deploy2002: Finished scap: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612) (duration: 08m 18s)
  • 16:03 ladsgroup@deploy2002: matmarex and ladsgroup: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 16:02 ladsgroup@deploy2002: Started scap: Backport for Revert "Enable hidden tag for "Edit Check" project on Wikipedias" (T324733 T333612)
  • 16:00 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:00 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 15:49 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:49 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:26 ladsgroup@deploy1002: Finished scap: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php""" (duration: 19m 14s)
  • 15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:14 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php""" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 15:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:06 ladsgroup@deploy1002: Started scap: Backport for Revert "Revert "Revert "mwscript: Switch to use run.php"""
  • 15:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 14:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 14:47 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host ms-fe1014.eqiad.wmnet
  • 14:47 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 14:43 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe1014.eqiad.wmnet
  • 14:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 14:43 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe1014.eqiad.wmnet
  • 14:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 13:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 13:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:12 elukey: move kafka-jumbo1004's kafka broker cert to PKI - T296064
  • 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 13:11 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:10 phedenskog@deploy2002: Finished deploy [performance/navtiming@c30b954]: (no justification provided) (duration: 00m 05s)
  • 13:10 phedenskog@deploy2002: Started deploy [performance/navtiming@c30b954]: (no justification provided)
  • 13:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:09 elukey: restart kafkatee on centrallog2002 - test to see if there are issues connecting to the jumbo brokers running pki
  • 12:55 eoghan@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab
  • 12:46 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:45 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:25 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:04 eoghan@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab
  • 12:00 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab
  • 11:42 Emperor: shutdown ms-be1042 for battery swap T332883
  • 11:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW
  • 11:41 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW
  • 11:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151.eqiad.wmnet']
  • 11:09 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab
  • 11:08 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab
  • 11:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
  • 10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 10:45 Amir1: Failover m1 from db1101 to db1164 - T333123
  • 10:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 10:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149.eqiad.wmnet']
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
  • 10:25 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover
  • 10:25 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover
  • 10:18 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab
  • 10:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:54 elukey: move kafka-jumbo1003's kafka broker cert to PKI - T296064
  • 09:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance
  • 09:54 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:02 elukey: move kafka-jumbo1002's kafka broker cert to PKI - T296064
  • 08:47 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 08:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on an-worker1091.eqiad.wmnet with reason: Replacing battery
  • 08:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on an-worker1091.eqiad.wmnet with reason: Replacing battery
  • 08:32 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 08:27 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:25 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:25 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:14 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 06:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 06:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 06:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 06:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 06:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 06:43 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit1003.wikimedia.org with OS bullseye
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:04 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 01:00 ejegg: payments-wiki upgraded from b5df483f to 60d0aed5
  • 00:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1225.eqiad.wmnet with OS bullseye
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1224.eqiad.wmnet with OS bullseye
  • 00:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
  • 00:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: host reimage
  • 00:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: host reimage
  • 00:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149.eqiad.wmnet']
  • 00:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
  • 00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1223.eqiad.wmnet with OS bullseye
  • 00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1156.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1156.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1224.eqiad.wmnet with reason: host reimage
  • 00:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1225.eqiad.wmnet with OS bullseye
  • 00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bullseye
  • 00:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1222.eqiad.wmnet with OS bullseye
  • 00:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:06 cstone: SmashPig upgraded from e86b0a66 to 7c19151f
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1224.eqiad.wmnet with reason: host reimage
  • 00:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3002.esams.wmnet on all recursors
  • 00:04 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3002.esams.wmnet on all recursors
  • 00:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"
  • 00:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:02 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3002.esams.wmnet - denisse@cumin1001"

2023-03-30

  • 23:59 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:59 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3002.esams.wmnet
  • 23:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1155.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
  • 23:59 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1154.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1154.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1224.eqiad.wmnet with OS bullseye
  • 23:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1222.eqiad.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1222.eqiad.wmnet with reason: host reimage
  • 23:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1152.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1223.eqiad.wmnet with OS bullseye
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1209.eqiad.wmnet with OS bullseye
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1152.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1151.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1222.eqiad.wmnet with OS bullseye
  • 23:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1151.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1220.eqiad.wmnet with OS bullseye
  • 23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1221.eqiad.wmnet with OS bullseye
  • 23:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1209.eqiad.wmnet with reason: host reimage
  • 23:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1209.eqiad.wmnet with reason: host reimage
  • 23:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bullseye
  • 23:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1221.eqiad.wmnet with reason: host reimage
  • 23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1221.eqiad.wmnet with reason: host reimage
  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1220.eqiad.wmnet with reason: host reimage
  • 23:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1209.eqiad.wmnet with OS bullseye
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1220.eqiad.wmnet with reason: host reimage
  • 22:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1221.eqiad.wmnet with OS bullseye
  • 22:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1220.eqiad.wmnet with OS bullseye
  • 22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1209']
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
  • 22:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1209']
  • 22:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
  • 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
  • 22:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
  • 22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db1209']
  • 22:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
  • 22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1210']
  • 21:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1209']
  • 21:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1210']
  • 21:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
  • 21:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1225']
  • 21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1218.eqiad.wmnet with OS bullseye
  • 21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1219.eqiad.wmnet with OS bullseye
  • 21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1225']
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1219.eqiad.wmnet with reason: host reimage
  • 20:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1218.eqiad.wmnet with reason: host reimage
  • 20:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1219.eqiad.wmnet with reason: host reimage
  • 20:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1218.eqiad.wmnet with reason: host reimage
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1223']
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1224']
  • 20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1219.eqiad.wmnet with OS bullseye
  • 20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1218.eqiad.wmnet with OS bullseye
  • 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS bullseye
  • 20:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1216.eqiad.wmnet with OS bullseye
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1224']
  • 20:20 thcipriani@deploy2002: Finished scap: Backport for Remove inline script from United States static page (T331681) (duration: 09m 42s)
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1223']
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 20:12 thcipriani@deploy2002: nray and thcipriani: Backport for Remove inline script from United States static page (T331681) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: host reimage
  • 20:11 thcipriani@deploy2002: Started scap: Backport for Remove inline script from United States static page (T331681)
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1216.eqiad.wmnet with reason: host reimage
  • 20:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1221']
  • 20:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1222']
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1216.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1214.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1215.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1214.eqiad.wmnet with reason: host reimage
  • 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1215.eqiad.wmnet with reason: host reimage
  • 19:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1214.eqiad.wmnet with reason: host reimage
  • 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1215.eqiad.wmnet with reason: host reimage
  • 19:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1222']
  • 19:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1221']
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1220']
  • 19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1219']
  • 19:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1003.wikimedia.org with OS bullseye
  • 19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:15 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1214.eqiad.wmnet with OS bullseye
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1215.eqiad.wmnet with OS bullseye
  • 19:08 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:08 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1213.eqiad.wmnet with OS bullseye
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:04 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:04 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1220']
  • 18:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1219']
  • 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS bullseye
  • 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:52 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:49 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
  • 18:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
  • 18:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 18:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 18:33 dduvall@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.2 refs T330208
  • 18:32 SandraEbele: started Airflow mediwiki wikitext dags after killing oozie jobs as part of Migration task
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1218']
  • 18:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1217']
  • 18:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:31 SandraEbele: Killed Oozie mediawiki-wikitext-history-coord and mediawiki-wikitext-current-coord
  • 18:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1213.eqiad.wmnet with OS bullseye
  • 18:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:23 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@5355ead]: (no justification provided) (duration: 00m 12s)
  • 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS bullseye
  • 18:22 ebysans@deploy2002: Started deploy [airflow-dags/analytics@5355ead]: (no justification provided)
  • 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bullseye
  • 18:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1211.eqiad.wmnet with OS bullseye
  • 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
  • 17:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1211.eqiad.wmnet with reason: host reimage
  • 17:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1211.eqiad.wmnet with reason: host reimage
  • 17:49 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bullseye
  • 17:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1211.eqiad.wmnet with OS bullseye
  • 17:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1207.eqiad.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 17:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1218']
  • 17:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1217']
  • 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
  • 17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit1003']
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
  • 17:28 SandraEbele: killed Oozie mediawiki-history-check_denormalize job and started Airflow mediawiki_history_check_denormalize dag.
  • 17:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1216']
  • 17:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1215']
  • 17:27 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@8b242c2]: (no justification provided) (duration: 00m 11s)
  • 17:27 ebysans@deploy2002: Started deploy [airflow-dags/analytics@8b242c2]: (no justification provided)
  • 17:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bullseye
  • 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage
  • 17:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage
  • 17:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1216']
  • 17:08 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1214']
  • 17:06 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:04 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1215']
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1213']
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1207.eqiad.wmnet with OS bullseye
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1214']
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1213']
  • 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1212']
  • 16:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1211']
  • 16:21 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2003-dev
  • 16:20 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2003-dev
  • 16:20 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2004-dev
  • 16:20 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2004-dev
  • 16:19 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2003-dev
  • 16:19 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2003-dev
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1212']
  • 16:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1211']
  • 16:09 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1209']
  • 16:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1209']
  • 16:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1208']
  • 16:01 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2001-dev
  • 16:01 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2001-dev
  • 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1207']
  • 15:45 cstone: SmashPig upgraded from 240c80a2 to e86b0a66
  • 15:44 mutante: phabricator maintenance window / deployment ended (T329974)
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1208']
  • 15:36 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2001-dev
  • 15:36 brennen@deploy2002: Finished deploy [phabricator/deployment@9f0866e]: deploy to phab1004 for T333516 (duration: 00m 42s)
  • 15:36 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2001-dev
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
  • 15:35 brennen@deploy2002: Started deploy [phabricator/deployment@9f0866e]: deploy to phab1004 for T333516
  • 15:34 brennen@deploy2002: Finished deploy [phabricator/deployment@9f0866e]: test deploy to phab2002 for T333516 (duration: 00m 30s)
  • 15:34 volans: upgraded spicerack to v6.4.1 on the cumin hosts
  • 15:34 brennen@deploy2002: Started deploy [phabricator/deployment@9f0866e]: test deploy to phab2002 for T333516
  • 15:34 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2002-dev
  • 15:33 mutante: phabricator maintenance / deploy window starting
  • 15:33 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2002-dev
  • 15:32 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2002-dev
  • 15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: maintenance
  • 15:32 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2002-dev
  • 15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: maintenance
  • 15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: maintenance
  • 15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: maintenance
  • 15:30 volans: uploaded spicerack_6.4.1 to apt.wikimedia.org bullseye-wikimedia
  • 15:14 lucaswerkmeister-wmde:: Deployed security patch for T333569
  • 15:08 lucaswerkmeister-wmde:: Deployed security patch for T333569
  • 14:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 14:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
  • 14:43 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:22 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:22 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 14:17 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 14:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
  • 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:36 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 12:32 joal@deploy2002: Finished deploy [airflow-dags/analytics@a6500cf]: Regular analytics weekly train (2nd) HOTFIX [airflow-dags/analytics@a6500cf] (duration: 00m 11s)
  • 12:31 joal@deploy2002: Started deploy [airflow-dags/analytics@a6500cf]: Regular analytics weekly train (2nd) HOTFIX [airflow-dags/analytics@a6500cf]
  • 12:27 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:26 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:17 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:17 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:17 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:15 ladsgroup@deploy2002: Finished scap: Backport for Set externallinks to WRITE BOTH everywhere (T321662) (duration: 14m 58s)
  • 12:08 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:02 ladsgroup@deploy2002: ladsgroup: Backport for Set externallinks to WRITE BOTH everywhere (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 12:00 ladsgroup@deploy2002: Started scap: Backport for Set externallinks to WRITE BOTH everywhere (T321662)
  • 11:57 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:50 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker1149-56 - jclark@cumin1001"
  • 11:49 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker1149-56 - jclark@cumin1001"
  • 11:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for rest-gateway - hnowlan@cumin1001"
  • 11:11 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for rest-gateway - hnowlan@cumin1001"
  • 11:10 ladsgroup@deploy2002: Finished scap: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800) (duration: 07m 59s)
  • 11:08 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 11:03 ladsgroup@deploy2002: ladsgroup: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:03 claime: Re-enabling puppet for cp-text - T331318
  • 11:02 ladsgroup@deploy2002: Started scap: Backport for Revert "Revert "mwscript: Switch to use run.php"" (T326800)
  • 10:58 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:58 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P45994 and previous config saved to /var/cache/conftool/dbconfig/20230330-105011-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1136 T333538', diff saved to https://phabricator.wikimedia.org/P45993 and previous config saved to /var/cache/conftool/dbconfig/20230330-104928-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1181 to s7 primary T333538', diff saved to https://phabricator.wikimedia.org/P45992 and previous config saved to /var/cache/conftool/dbconfig/20230330-104617-ladsgroup.json
  • 10:45 Amir1: Starting s7 eqiad failover from db1136 to db1181 - T333538
  • 10:44 volans@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P45989 and previous config saved to /var/cache/conftool/dbconfig/20230330-103506-ladsgroup.json
  • 10:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:27 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1181 with weight 0 T333538', diff saved to https://phabricator.wikimedia.org/P45988 and previous config saved to /var/cache/conftool/dbconfig/20230330-102012-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P45987 and previous config saved to /var/cache/conftool/dbconfig/20230330-102002-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T333538
  • 10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T333538
  • 10:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P45985 and previous config saved to /var/cache/conftool/dbconfig/20230330-100457-ladsgroup.json
  • 09:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:48 joal@deploy2002: Finished deploy [airflow-dags/analytics@b7b41ae]: Regular analytics weekly train (2nd) [airflow-dags/analytics@b7b41ae] (duration: 00m 11s)
  • 09:47 joal@deploy2002: Started deploy [airflow-dags/analytics@b7b41ae]: Regular analytics weekly train (2nd) [airflow-dags/analytics@b7b41ae]
  • 09:44 claime: Re-enabling puppet for cp-text_ulsfo - T331318
  • 09:36 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd] (hadoop-test): Regular analytics weekly train (2nd) TEST [analytics/refinery@359f4bd] (duration: 01m 28s)
  • 09:35 joal@deploy2002: Started deploy [analytics/refinery@359f4bd] (hadoop-test): Regular analytics weekly train (2nd) TEST [analytics/refinery@359f4bd]
  • 09:35 claime: Re-enabling puppet for cp4037 - T331318
  • 09:34 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd] (thin): Regular analytics weekly train (2nd) THIN [analytics/refinery@359f4bd] (duration: 00m 08s)
  • 09:34 joal@deploy2002: Started deploy [analytics/refinery@359f4bd] (thin): Regular analytics weekly train (2nd) THIN [analytics/refinery@359f4bd]
  • 09:33 joal@deploy2002: Finished deploy [analytics/refinery@359f4bd]: Regular analytics weekly train (2nd) [analytics/refinery@359f4bd] (duration: 05m 53s)
  • 09:28 joal@deploy2002: Started deploy [analytics/refinery@359f4bd]: Regular analytics weekly train (2nd) [analytics/refinery@359f4bd]
  • 09:23 claime: Re-enabling puppet for A:cp-upload - T331318
  • 09:16 claime: Running puppet on cp2028.codfw.wmnet (cp-upload noop test) - T331318
  • 09:15 claime: puppet disabled for A:cp-upload - T331318
  • 09:12 claime: puppet disabled for A:cp-text - T331318
  • 09:09 claime: Merging mw-on-k8s ATS lua routing script - T331318
  • 09:04 godog: silence LogstashIndexingFailures during investigation T180051
  • 08:55 elukey: move kafka main clusters to new truststore (PKI+Puppet root CA certs) - T319372
  • 08:54 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 00:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
  • 00:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
  • 00:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
  • 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
  • 00:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1207']
  • 00:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1207']
  • 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
  • 00:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 00:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
  • 00:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 00:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1225.mgmt.eqiad.wmnet with reboot policy FORCED

2023-03-29

  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2002.wikimedia.org with reason: WIP-known-to-be-debugged-new-host
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1224.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1223.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache T324659
  • 23:48 mutante: contint2002 - a2dismod mpm_event (ONCE AGAIN this year old issue when applying roles with apache for the first time) - running puppet - now it can actually install PHP 7.3 and start apache
  • 23:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 23:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1222.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1221.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1220.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1219.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1218.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1217.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1216.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1215.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1214.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1213.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 22:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1212.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 21:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
  • 21:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
  • 21:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit1003']
  • 21:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit1003']
  • 21:45 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads (duration: 00m 14s)
  • 21:45 ebernhardson@deploy2002: Started deploy [airflow-dags/search@ada9bb0]: disable auto-versioning of glent uploads
  • 21:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
  • 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
  • 20:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
  • 20:29 taavi@deploy2002: Finished scap: Backport for Add per-action component-level profiling in statsd using excimer (T225968) (duration: 11m 52s)
  • 20:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 20:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 20:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073']
  • 20:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073']
  • 20:18 taavi@deploy2002: aaron and taavi: Backport for Add per-action component-level profiling in statsd using excimer (T225968) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:17 taavi@deploy2002: Started scap: Backport for Add per-action component-level profiling in statsd using excimer (T225968)
  • 20:15 taavi@deploy2002: Finished scap: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681) (duration: 09m 45s)
  • 20:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
  • 20:10 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1210.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 taavi@deploy2002: nray and taavi: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:06 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1211.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:06 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:05 taavi@deploy2002: Started scap: Backport for Update "United States" static page to facilitate synthetic testing of T331681 (T331681)
  • 20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 19:50 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 19:48 volans@cumin1001: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:20 sukhe: force puppet agent run on A:lvs to additionally confirm nothing broke
  • 19:20 sukhe: [enable] puppet on A:lvs to roll out pybal prometheus-client change
  • 19:14 sukhe: disable puppet on A:lvs to roll out pybal prometheus-client change
  • 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 18:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1138 T333480', diff saved to https://phabricator.wikimedia.org/P45981 and previous config saved to /var/cache/conftool/dbconfig/20230329-185431-ladsgroup.json
  • 18:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary T333480', diff saved to https://phabricator.wikimedia.org/P45980 and previous config saved to /var/cache/conftool/dbconfig/20230329-185125-ladsgroup.json
  • 18:50 Amir1: Starting s4 eqiad failover from db1138 to db1160 - T333480
  • 18:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 dduvall@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.2 refs T330208 (duration: 05m 48s)
  • 18:39 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.2 refs T330208
  • 18:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:38 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3 (duration: 00m 16s)
  • 18:38 ebernhardson@deploy2002: Started deploy [airflow-dags/search@d66d6e0]: bump glent to 0.3.3
  • 18:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 18:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T333480', diff saved to https://phabricator.wikimedia.org/P45979 and previous config saved to /var/cache/conftool/dbconfig/20230329-182536-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T333480
  • 18:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T333480
  • 18:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
  • 18:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:16 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.2 refs T330208
  • 18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: PC maint
  • 17:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:43 brett: Re-enable puppet on A:cp - T284555
  • 17:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
  • 17:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 brett: Disable puppet on A:cp to roll out another T284555
  • 17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1209.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:18 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 17:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:11 brett: Re-enable puppet on A:cp - T284555
  • 16:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:44 brett: Disable puppet on A:cp to roll out T284555
  • 16:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bullseye
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:29 btullis@cumin1001: Added views for new wiki: anpwiki T332458
  • 16:05 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:51 btullis@cumin1001: Added views for new wiki: gucwiki T326235
  • 15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 15:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:29 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:28 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:27 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:27 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bullseye
  • 15:27 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:26 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:07 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
  • 15:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: Stop kafka, dist-upgrade
  • 15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:01 jgleeson: SmashPig upgraded from 758a34c1 to 240c80a2
  • 15:01 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive. (duration: 00m 13s)
  • 15:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4a7a6cc]: prefix hive properties with spark.hive.
  • 14:59 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
  • 14:58 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
  • 14:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2005-dev
  • 14:57 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2005-dev
  • 14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:49 XioNoX: Remove custom BGP graceful-shutdown on all core routers - T320230
  • 14:47 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:19 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) (duration: 07m 30s)
  • 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)
  • 14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) (duration: 08m 02s)
  • 14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:00 XioNoX: merge/deploy change in Puppet's modules/network/data/data.yaml - T327930
  • 13:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:56 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialRecentChangesLinked: Use SelectQueryBuilder directly (T333339)
  • 13:56 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:54 jgiannelos@deploy2002: Finished deploy [restbase/deploy@0d2f12f]: (no justification provided) (duration: 17m 59s)
  • 13:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:42 elukey: run dist-upgrade on kafka-main2002 to upgrade it to bullseye - T332013
  • 13:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
  • 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka, dist-upgrade
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript cleanupTitles.php gurwiki # T332241 (2 of 767 rows updated)
  • 13:37 sukhe: enable puppet on A:lvs to test Python 2 deprecation change: T321309
  • 13:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@0d2f12f]: (no justification provided)
  • 13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php gurwiki --fix # T332241 – 0 pages to fix (0 resolvable), 0 links to fix (0 resolvable, 0 deleted)
  • 13:30 XioNoX: enable vcp-snmp-statistics on fasw-c-codfw
  • 13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enabled native gallery editing in Parsoid (T329662) (duration: 10m 19s)
  • 13:29 sukhe: disable puppet on A:lvs to test Python 2 deprecation change: T321309
  • 13:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and arlolra: Backport for Enabled native gallery editing in Parsoid (T329662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:19 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enabled native gallery editing in Parsoid (T329662)
  • 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable history page visual diffs on remaining wikis (T314588) (duration: 08m 23s)
  • 13:12 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:10 dcausse@deploy2002: Finished deploy [airflow-dags/search@92e9876]: (no justification provided) (duration: 00m 14s)
  • 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Enable history page visual diffs on remaining wikis (T314588) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:09 dcausse@deploy2002: Started deploy [airflow-dags/search@92e9876]: (no justification provided)
  • 13:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable history page visual diffs on remaining wikis (T314588)
  • 13:01 XioNoX: test enabling lldp on mr1-ulsfo
  • 12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:55 XioNoX: test enabling lldp on pfw3-codfw
  • 12:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 12:43 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 12:22 btullis@cumin1001: Added views for new wiki: gurwiki T327841
  • 11:57 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:55 btullis@cumin1001: Added views for new wiki: shnwikivoyage T302798
  • 11:55 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:54 btullis@cumin1001: Added views for new wiki: guwwiktionary T309056
  • 11:54 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:53 btullis@cumin1001: Added views for new wiki: guwwiki T303761
  • 11:53 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:51 btullis@cumin1001: Added views for new wiki: kcgwiki T305280
  • 11:51 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:18 jgiannelos@deploy2002: deploy aborted: (no justification provided) (duration: 00m 01s)
  • 11:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@c265f3f] (beta): (no justification provided)
  • 11:12 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
  • 11:07 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing GraphQL - jbond@cumin2002"
  • 10:58 claime: authdns-update successful on all nodes - T333120
  • 10:57 claime: Running authdns-update
  • 10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int,name=codfw
  • 10:55 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro
  • 10:52 claime: Running puppet on dns-auth - T333120
  • 10:50 claime: Switching mw-api-int to production - T333120
  • 10:50 claime: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
  • 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
  • 10:46 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)
  • 10:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)
  • 10:41 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)
  • 10:37 claime: Switching mw-api-int to lvs_setup - T333120
  • 10:21 hnowlan@deploy2002: Finished deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki T332093 T332379 (duration: 19m 30s)
  • 10:02 hnowlan@deploy2002: Started deploy [restbase/deploy@c265f3f]: Add ckbwiktionary, anpwiki T332093 T332379
  • 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:58 claime: running puppet on O:kubernetes::worker and O:lvs::balancer - T333120
  • 09:58 denisse: updating prometheus3001 to bullseye
  • 09:57 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:57 claime: Adding mw-api-int to service_catalog in service_setup - T333120
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: a good reason - ayounsi@cumin1001
  • 09:33 filippo@deploy2002: Finished scap: Backport for Revert "Failover statsd to graphite2004" (duration: 07m 34s)
  • 09:27 filippo@deploy2002: filippo: Backport for Revert "Failover statsd to graphite2004" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:26 filippo@deploy2002: Started scap: Backport for Revert "Failover statsd to graphite2004"
  • 09:02 elukey: move kafka on kafka-jumbo1001 to PKI TLS certs - T296064
  • 09:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: restart kafka, upgrade to PKI
  • 08:03 volans: installed spicerack v6.4.0 on cumin1001
  • 07:37 kartik@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995) (duration: 12m 35s)
  • 07:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
  • 07:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: Stop kafka, dist-upgrade
  • 07:31 oblivian@deploy2002: Finished deploy [restbase/deploy@11477d6]: Updating stale nodes, T333069 (duration: 32m 07s)
  • 07:27 volans: installed spicerack v6.4.0 on cumin2002
  • 07:26 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:25 kartik@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20230329 (T333128 T328533 T317995)
  • 07:07 slyngs: Update Squid logformat (urldownloader[1001-1002,2001-2002,2004].wikimedia.org)
  • 06:59 oblivian@deploy2002: Started deploy [restbase/deploy@11477d6]: Updating stale nodes, T333069
  • 06:47 hashar: Restarted Gerrit
  • 06:43 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 10s)
  • 06:43 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
  • 06:42 hashar: gerrit2002: restarted Gerrit replica instance
  • 06:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins (duration: 00m 06s)
  • 06:40 hashar@deploy2002: Started deploy [gerrit/gerrit@e7c1696]: Update Gerrit javascript plugins
  • 06:38 phedenskog@deploy2002: Finished deploy [performance/navtiming@f6c9fa3]: (no justification provided) (duration: 00m 05s)
  • 06:38 phedenskog@deploy2002: Started deploy [performance/navtiming@f6c9fa3]: (no justification provided)
  • 06:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 06:21 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
  • 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
  • 00:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet
  • 00:37 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet
  • 00:30 sukhe: restart pybal on lvs1018 to hopefully resolve flapping BGP session
  • 00:06 zabe@deploy2002: Finished scap: Backport for throttle: Remove expired throttle (duration: 07m 19s)
  • 00:00 zabe@deploy2002: zabe: Backport for throttle: Remove expired throttle synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet

2023-03-28

  • 23:59 zabe@deploy2002: Started scap: Backport for throttle: Remove expired throttle
  • 23:46 zabe@deploy2002: Finished scap: T331831 (duration: 06m 50s)
  • 23:39 zabe@deploy2002: Started scap: T331831
  • 23:34 zabe@deploy2002: Finished scap: T331831 (duration: 07m 01s)
  • 23:27 zabe@deploy2002: Started scap: T331831
  • 23:27 zabe: central Kurdish Wiktionary (ckbwiktionary)
  • 22:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
  • 22:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:44 eileen: civicrm upgraded from db3b727e to 183d131d
  • 21:23 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments (duration: 00m 13s)
  • 21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments
  • 21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:06 urandom: updating image_suggestions default table TTL(s) from 1209600 to 1814400 (seconds) — T333319
  • 21:05 phedenskog@deploy2002: Finished deploy [performance/navtiming@4d22874]: (no justification provided) (duration: 00m 06s)
  • 21:05 phedenskog@deploy2002: Started deploy [performance/navtiming@4d22874]: (no justification provided)
  • 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:03 urbanecm@deploy2002: Finished scap: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733) (duration: 28m 53s)
  • 20:51 urbanecm@deploy2002: urbanecm and matmarex: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:34 urbanecm@deploy2002: Started scap: Backport for Only run edit check on main namespace, Change name of the editcheck-needreference tag to editcheck-references, Enable hidden tag for "Edit Check" project on Wikipedias (T324733)
  • 20:27 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes (duration: 00m 13s)
  • 20:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes
  • 20:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:02 ejegg: payments-wiki upgraded from f5ec2677 to b5df483f
  • 19:29 dduvall@deploy2002: Pruned MediaWiki: 1.40.0-wmf.27 (duration: 02m 11s)
  • 19:26 dduvall@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs T330208 (duration: 07m 24s)
  • 19:19 dduvall@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
  • 18:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:37 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance (duration: 00m 20s)
  • 18:36 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance
  • 18:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
  • 18:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002"
  • 18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=ats-be
  • 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=cdn
  • 16:52 volans: uploaded spicerack_6.4.0 to apt.wikimedia.org bullseye-wikimedia (but I'll deploy it to the cumin hosts tomorrow)
  • 16:10 jnuche@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs T330208 (duration: 49m 52s)
  • 16:09 bblack: reboot cp1082 (NIC issues)
  • 16:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=ats-be
  • 16:03 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=cdn
  • 16:00 inflatador: bking@cumin1001 unban elastic and cloudelastic nodes post maintenance T330165
  • 15:57 btullis@deploy2002: Finished deploy [analytics/refinery@6554ec0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6554ec0] (duration: 01m 32s)
  • 15:20 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
  • 15:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:15 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:14 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:08 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 15:05 jnuche@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.41.0-wmf.2" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 03s)
  • 15:05 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs T330208
  • 14:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=eqiad
  • 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=device-analytics,name=pki
  • 14:53 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=device-analytics,name=eqiad
  • 14:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=device-analytics
  • 14:51 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row B switches upgrade done - T330165
  • 14:48 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 14:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor100[12].eqiad.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 14:32 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches upgrade done - T330165
  • 14:31 sukhe: run authdns-update to revert eqiad depool
  • 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=THANOS-FE-OLD-FQDN,service=thanos-web
  • 14:05 XioNoX: reboot eqiad row B for upgrade - T330165
  • 13:58 godog: depool thanos-fe1002 - T330165
  • 13:54 Emperor: depool ms-fe1010 before switch work T330165
  • 13:53 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
  • 13:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 13:47 akosiaris: depool swift in eqiad for row B upgrade
  • 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
  • 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 13:46 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 249 hosts with reason: eqiad row B upgrade
  • 13:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:45 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:44 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 13:41 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
  • 13:36 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:34 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor,name=eqiad
  • 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1002.eqiad.wmnet
  • 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
  • 13:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row B switches upgrade - T330165
  • 12:59 XioNoX: depool eqiad for network maintenance - T330165
  • 12:58 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches upgrade - T330165
  • 12:57 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:56 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 12:44 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 12:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 12:43 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 12:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 12:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 12:36 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye
  • 12:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 12:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 12:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
  • 12:21 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage
  • 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295
  • 12:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 45295
  • 12:09 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye
  • 11:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 11:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 11:56 elukey: dist-upgrade kafka-main1002 to debian bullseye - T332013
  • 11:51 ladsgroup@deploy2002: Finished scap: Backport for api: Mark query as read-only to avoid regex on SQL (T332942) (duration: 18m 42s)
  • 11:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:37 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:34 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:34 ladsgroup@deploy2002: ladsgroup: Backport for api: Mark query as read-only to avoid regex on SQL (T332942) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:32 ladsgroup@deploy2002: Started scap: Backport for api: Mark query as read-only to avoid regex on SQL (T332942)
  • 11:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:22 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
  • 10:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
  • 09:56 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
  • 09:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues
  • 09:41 vgutierrez: resetting cp2035 management card - T333312
  • 09:38 elukey: dist-upgrade kafka-main1001 to bullseye - T332013
  • 09:36 godog: silence systemdunitfailed alerts for team=wmcs - T333315
  • 09:35 vgutierrez: depool cp2035 - T333312
  • 09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 09:12 jbond@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts
  • 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts
  • 09:11 jbond@cumin1001: END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
  • 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts
  • 08:58 vgutierrez: restart ipmiseld on cp2035
  • 08:50 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
  • 08:49 ayounsi@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:48 AndyRussG: update payments.wiki config 65bedd4a -> e31ffd7d, payments (automatic updates only) a6c6c2b1 -> f5ec2677
  • 08:45 ayounsi@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:43 ayounsi@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:42 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 08:39 ayounsi@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:37 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:35 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:34 ayounsi@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:32 ayounsi@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:32 phedenskog@deploy2002: Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s)
  • 08:32 phedenskog@deploy2002: Started deploy [performance/navtiming@e757bdf]: (no justification provided)
  • 08:31 ayounsi@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:29 ayounsi@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:25 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:21 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:14 ayounsi@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:11 oblivian@deploy2002: Finished scap: Backport for Failover statsd to graphite2004 (T330165) (duration: 08m 48s)
  • 08:08 ayounsi@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 08:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance
  • 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance
  • 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance
  • 08:04 oblivian@deploy2002: oblivian and filippo: Backport for Failover statsd to graphite2004 (T330165) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
  • 08:03 ayounsi@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance
  • 08:02 oblivian@deploy2002: Started scap: Backport for Failover statsd to graphite2004 (T330165)
  • 08:02 ayounsi@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:00 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:00 godog: move graphite reads to codfw - T330165
  • 07:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:56 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:54 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:54 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:51 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:51 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45965 and previous config saved to /var/cache/conftool/dbconfig/20230328-073122-root.json
  • 07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17806
  • 07:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 17806
  • 07:20 kartik@deploy2002: Finished scap: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834) (duration: 12m 05s)
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45964 and previous config saved to /var/cache/conftool/dbconfig/20230328-071617-root.json
  • 07:10 kartik@deploy2002: kartik: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:08 kartik@deploy2002: Started scap: Backport for Enable Section Translation on some wikis while Content Translation remains in beta (T308834)
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45963 and previous config saved to /var/cache/conftool/dbconfig/20230328-070112-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45962 and previous config saved to /var/cache/conftool/dbconfig/20230328-064607-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45961 and previous config saved to /var/cache/conftool/dbconfig/20230328-063103-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45960 and previous config saved to /var/cache/conftool/dbconfig/20230328-061558-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 T329481', diff saved to https://phabricator.wikimedia.org/P45959 and previous config saved to /var/cache/conftool/dbconfig/20230328-061441-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P45958 and previous config saved to /var/cache/conftool/dbconfig/20230328-060053-root.json
  • 05:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:53 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 05:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:47 AndyRussG: update payments-wiki f5e262d1 -> a6c6c2b1
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P45957 and previous config saved to /var/cache/conftool/dbconfig/20230328-054548-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P45956 and previous config saved to /var/cache/conftool/dbconfig/20230328-053043-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45955 and previous config saved to /var/cache/conftool/dbconfig/20230328-051539-root.json
  • 01:59 krinkle@deploy2002: Synchronized wmf-config/mc.php: I44edcd (duration: 06m 33s)

2023-03-27

  • 23:47 mutante: people1003 - taking down apache to provoke monitoring alert (inactive instances) and confirm IRC alerting change works
  • 23:31 zabe: deployed patch for T330968
  • 23:08 zabe@deploy2002: Finished scap: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514) (duration: 21m 27s)
  • 23:00 zabe@deploy2002: zabe: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:48 mutante: stat1005 - kill 18179; run puppet ; stat1007 - kill 3346; run puppet ; stat1006 - kill 23887 run puppet
  • 22:47 zabe@deploy2002: Started scap: Backport for Rename "Support and Safety" to "Trust and Safety" (T330514)
  • 22:43 mutante: stat1004 - kill 29291; run puppet
  • 22:43 mutante: apt2001 - kill 3105; run puppet
  • 22:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Meta:WMF Support and Safety" "Meta:WMF Trust and Safety" "Zabe" --reason "per T330514" # T330514
  • 21:58 maryum: Deploy security fix for T326952
  • 21:58 urandom: power cycling restbase1033 — T333243
  • 21:45 ryankemper: T330165 Depooled relevant search platform hosts: `sudo -E cumin 'elastic[1055-1056,1074-1079,1085-1086]*,cloudelastic100[2,6]*,wcqs1002*,wdqs[1007,1012]*' 'sudo depool'`
  • 21:24 Amir1: start of watchlist clean up in arwiki (T328501)
  • 21:23 kindrobot: finish UTC late backports
  • 21:22 kindrobot@deploy2002: Finished scap: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279) (duration: 08m 26s)
  • 21:15 kindrobot@deploy2002: kindrobot and superpes: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:15 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided) (duration: 00m 13s)
  • 21:14 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided)
  • 21:14 kindrobot@deploy2002: Started scap: Backport for Disable VisualEditor from talk namespace, [sysop_itwiki] Add the logo also for vector 2022 (T330279)
  • 21:11 tzatziki: moving Universal Code of Conduct/Enforcement guidelines -> Universal Code of Conduct/Enforcement guidelines/Version 1 on metawiki with `extensions/Translate/scripts/moveTranslatableBundle.php `
  • 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1022.eqiad.wmnet
  • 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:41 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1022.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1021.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:33 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:31 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1021.eqiad.wmnet
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1017.eqiad.wmnet
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:23 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:21 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:20 kindrobot@deploy2002: Finished scap: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093) (duration: 10m 50s)
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1017.eqiad.wmnet
  • 20:11 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:10 kindrobot@deploy2002: Started scap: Backport for Expand list of wikis with language button at top. (T331777), Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)
  • 20:01 kindrobot: start UTC late backport window
  • 19:21 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2 (duration: 00m 14s)
  • 19:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2
  • 19:06 jgleeson: civicrm upgraded from 09373b9d to db3b727e
  • 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:39 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:39 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:34 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:34 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:33 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:25 jgleeson: payments-wiki upgraded from 36366f64 to f5e262d1
  • 15:55 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided) (duration: 00m 11s)
  • 15:54 ebysans@deploy2002: Started deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided)
  • 15:20 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:17 elukey@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 10s)
  • 15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict1002.eqiad.wmnet
  • 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict1002.eqiad.wmnet on all recursors
  • 14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict1002.eqiad.wmnet on all recursors
  • 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
  • 14:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001"
  • 14:52 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 14:52 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host aphlict1002.eqiad.wmnet
  • 14:48 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:48 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:47 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:47 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:44 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:43 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:40 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:29 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:29 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 14:28 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 14:27 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:16 taavi: taavi@mwmaint2002 ~ $ mwscript namespaceDupes.php --wiki=huwiki --fix # T333083
  • 14:15 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:15 taavi@deploy2002: Finished scap: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166) (duration: 08m 27s)
  • 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:14 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:08 taavi@deploy2002: taavi: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:06 taavi@deploy2002: Started scap: Backport for namespaceDupes: Remove extra addQuotes() calls (T333166)
  • 13:35 fab@deploy2002: Finished deploy [airflow-dags/research@d2c115d]: (no justification provided) (duration: 00m 21s)
  • 13:35 fab@deploy2002: Started deploy [airflow-dags/research@d2c115d]: (no justification provided)
  • 13:12 taavi@deploy2002: Finished scap: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083) (duration: 08m 45s)
  • 13:04 taavi@deploy2002: superpes and taavi: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 taavi@deploy2002: Started scap: Backport for [huwiki] Add Draft and Draft_talk namespaces (T333083)
  • 12:42 godog: flip alert* to overlay2 - T329939
  • 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 10:31 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:30 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:10 elukey: dist-upgrade kafka-main1003 manually to bullseye - T332013
  • 10:03 Emperor: depool ms-fe2009
  • 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 09:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade
  • 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45295
  • 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45295
  • 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:39 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
  • 08:57 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001"
  • 08:55 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 08:47 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 08:39 ladsgroup@deploy1002: Finished scap: Backport for EntityUsageTable: Mark query as read-only (T332941) (duration: 18m 15s)
  • 08:30 ladsgroup@deploy1002: ladsgroup: Backport for EntityUsageTable: Mark query as read-only (T332941) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:28 jynus: restarting bacula at backup1001 T331510
  • 08:25 urbanecm@deploy2002: Synchronized wmf-config/InitialiseSettings.php: 63dd23b: [Growth] eswiki: Enable mentorship for 50% of newcomers (T332737, T285235) (duration: 06m 09s)
  • 08:21 ladsgroup@deploy1002: Started scap: Backport for EntityUsageTable: Mark query as read-only (T332941)
  • 08:18 urbanecm@deploy2002: Backport cancelled.
  • 08:06 urbanecm@deploy2002: Finished scap: Backport for GrowthMentors.json: Add a write-only username field (T331444) (duration: 07m 52s)
  • 08:03 marostegui: Failover m1 from db1164 to db1101 - T331510
  • 08:00 urbanecm@deploy2002: urbanecm: Backport for GrowthMentors.json: Add a write-only username field (T331444) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:58 urbanecm@deploy2002: Started scap: Backport for GrowthMentors.json: Add a write-only username field (T331444)
  • 07:55 urbanecm@deploy2002: Finished scap: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075) (duration: 16m 45s)
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45949 and previous config saved to /var/cache/conftool/dbconfig/20230327-075206-root.json
  • 07:48 urbanecm@deploy2002: urbanecm: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:39 jynus: disabling puppet and shutding down bacula at backup1001 T331510
  • 07:38 urbanecm@deploy2002: Started scap: Backport for SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45948 and previous config saved to /var/cache/conftool/dbconfig/20230327-073701-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45947 and previous config saved to /var/cache/conftool/dbconfig/20230327-072156-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45946 and previous config saved to /var/cache/conftool/dbconfig/20230327-070651-root.json
  • 06:51 marostegui: dbmaint s3 eqiad Rename flaggedrevs tables on db1123 ptwikisource T332594
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45945 and previous config saved to /var/cache/conftool/dbconfig/20230327-065147-root.json
  • 06:40 marostegui: Rename flaggedrevs tables on db1123 ptwikisource T332594
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45944 and previous config saved to /var/cache/conftool/dbconfig/20230327-063642-root.json
  • 05:40 kart_: Updated cxserver to 2023-03-17-133444-production (T332379 + build changes)
  • 05:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:37 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:24 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:23 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T332292', diff saved to https://phabricator.wikimedia.org/P45942 and previous config saved to /var/cache/conftool/dbconfig/20230327-051941-root.json
  • 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510
  • 05:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510

2023-03-25

  • 07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s)
  • 07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0
  • 00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
  • 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
  • 00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host
  • 00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer
  • 00:09 tzatziki: removing 2 files for legal compliance

2023-03-24

  • 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
  • 23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
  • 23:50 tzatziki: removing 1 file for legal compliance
  • 21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after I4fe17f which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. T328907
  • 18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s)
  • 18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable
  • 18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s)
  • 18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags
  • 18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s)
  • 18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag
  • 17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s)
  • 17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag
  • 15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request T332917" # T332917
  • 11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
  • 10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:00 marostegui: Upgrade db1204 to mariadb 10.6 T330861
  • 08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key T332972
  • 05:37 hashar: gerrit: refreshed ssh host key for `github.com`
  • 05:28 hashar: Restarted Gerrit
  • 05:26 hashar: Stopping Gerrit
  • 05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068) (duration: 00m 10s)
  • 05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068)
  • 05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org
  • 05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068) (duration: 00m 07s)
  • 05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot (T330850) and displaying Zuul status on changes (T241068)
  • 05:17 hashar: Restarting Gerrit for deploying plugins updates
  • 05:10 ejegg: Standalone SmashPig upgraded from 3b84e4cb to 50139e82
  • 05:04 ejegg: payments-wiki upgraded from 4d0c90b4 to 4b0a71fa
  • 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply

2023-03-23

  • 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - T332952
  • 22:20 mutante: moscovium performing apt-get full-upgrade T332952
  • 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - T327068
  • 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) T327068
  • 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
  • 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) T327068
  • 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
  • 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
  • 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
  • 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
  • 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
  • 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
  • 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
  • 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
  • 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 20:33 taavi@deploy2002: Finished scap: Backport for MessageWebImporter: Use translation instead of language code on import (T323430) (duration: 10m 56s)
  • 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
  • 20:24 taavi@deploy2002: abi and taavi: Backport for MessageWebImporter: Use translation instead of language code on import (T323430) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:23 taavi@deploy2002: Started scap: Backport for MessageWebImporter: Use translation instead of language code on import (T323430)
  • 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
  • 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
  • 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
  • 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
  • 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
  • 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
  • 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
  • 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1 refs T330207
  • 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 17:38 mutante: moscovium - systemctl stop rsync
  • 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
  • 16:59 sukhe: rolling out CR 901333 to A:cp-text T313578
  • 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
  • 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
  • 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
  • 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
  • 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
  • 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
  • 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
  • 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - T332796
  • 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
  • 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
  • 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
  • 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
  • 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
  • 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
  • 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
  • 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
  • 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
  • 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
  • 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
  • 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
  • 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
  • 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
  • 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
  • 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
  • 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
  • 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
  • 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
  • 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
  • 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
  • 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
  • 13:46 TheresNoTime: close UTC afternoon backport window
  • 13:45 samtar@deploy2002: Finished scap: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759) (duration: 07m 46s)
  • 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
  • 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
  • 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
  • 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
  • 13:39 samtar@deploy2002: samtar: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:37 samtar@deploy2002: Started scap: Backport for core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)
  • 13:36 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: disable add a link backend (T304551) (duration: 08m 05s)
  • 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
  • 13:29 samtar@deploy2002: samtar and sgimeno: Backport for GrowthExperiments: disable add a link backend (T304551) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:28 samtar@deploy2002: Started scap: Backport for GrowthExperiments: disable add a link backend (T304551)
  • 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` T332470
  • 13:25 samtar@deploy2002: Finished scap: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470) (duration: 08m 39s)
  • 13:18 samtar@deploy2002: samtar and superpes: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:16 samtar@deploy2002: Started scap: Backport for [trwikiquote] Removing the temporary logo (already reverted) (T329399), [ckbwiki] Add Draft and Draft_talk namespaces (T332470)
  • 13:15 samtar@deploy2002: Finished scap: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813) (duration: 11m 47s)
  • 13:08 samtar@deploy2002: samtar and superpes: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:03 samtar@deploy2002: Started scap: Backport for [dkwikimedia] Fixing current logo with an HD version (T332784), [ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)
  • 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
  • 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
  • 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
  • 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
  • 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - T332796
  • 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
  • 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
  • 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
  • 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
  • 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
  • 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
  • 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
  • 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
  • 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
  • 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
  • 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
  • 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
  • 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
  • 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
  • 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
  • 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
  • 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
  • 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
  • 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
  • 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
  • 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia T332584 T332589
  • 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - T332803
  • 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - T332796
  • 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
  • 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - T332796
  • 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - T332803
  • 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
  • 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
  • 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
  • 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
  • 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - T332803
  • 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
  • 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
  • 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
  • 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
  • 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for T331896 - no automatic sync / deploy for these
  • 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - T332812"
  • 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - T332812"
  • 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
  • 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
  • 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
  • 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye

2023-03-22

  • 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
  • 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
  • 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
  • 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
  • 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
  • 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
  • 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
  • 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # T332782
  • 23:31 zabe@deploy2002: Finished scap: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782) (duration: 10m 03s)
  • 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
  • 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
  • 23:22 zabe@deploy2002: zabe: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:21 zabe@deploy2002: Started scap: Backport for wikimaniawiki: Add namespace for 2024 wikimania (T332782)
  • 21:15 taavi: UTC late backports complete
  • 21:13 taavi@deploy2002: Finished scap: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031) (duration: 07m 29s)
  • 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
  • 21:08 taavi@deploy2002: taavi: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:06 taavi@deploy2002: Started scap: Backport for Remove OATHAuthMultipleDevicesMigrationStage from CS, [beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)
  • 21:05 taavi@deploy2002: Finished scap: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS (duration: 07m 17s)
  • 20:59 taavi@deploy2002: taavi: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:58 taavi@deploy2002: Started scap: Backport for Set OATHAuthMultipleDevicesMigrationStage in IS
  • 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable page tools for anonymous users (T331052) (duration: 10m 10s)
  • 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ T332803
  • 20:36 samtar@deploy2002: Finished scap: Backport for Enable pinning for anon main menu when page tools is enabled (T331657) (duration: 11m 47s)
  • 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ T332803
  • 20:32 akosiaris: reboot kubernetes1023 for a test once more
  • 20:28 samtar@deploy2002: samtar and nray: Backport for Enable pinning for anon main menu when page tools is enabled (T331657) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:25 akosiaris: reboot kubernetes1023 for a test
  • 20:24 samtar@deploy2002: Started scap: Backport for Enable pinning for anon main menu when page tools is enabled (T331657)
  • 20:23 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813) (duration: 09m 57s)
  • 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
  • 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
  • 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:15 samtar@deploy2002: kharlan and samtar: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:13 samtar@deploy2002: Started scap: Backport for GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)
  • 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
  • 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
  • 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
  • 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
  • 20:09 samtar@deploy2002: Finished scap: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config (duration: 07m 22s)
  • 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
  • 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
  • 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
  • 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
  • 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
  • 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
  • 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
  • 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:04 samtar@deploy2002: samtar and matmarex: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
  • 20:02 samtar@deploy2002: Started scap: Backport for Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745), Clean up DiscussionTools labs config
  • 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
  • 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
  • 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1 refs T330207
  • 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ T331896 - but also see T332101
  • 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
  • 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
  • 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
  • 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # T332121 (duration: 06m 44s)
  • 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
  • 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
  • 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
  • 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
  • 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - T332796
  • 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
  • 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia T332584 T332589
  • 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
  • 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
  • 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
  • 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
  • 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
  • 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
  • 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
  • 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
  • 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 15:22 hnowlan: removing java packages from maps hosts
  • 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:13 hnowlan: removing cassandra packages from maps hosts
  • 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
  • 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
  • 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
  • 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
  • 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
  • 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
  • 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
  • 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
  • 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
  • 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) T323961
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted T323961', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
  • 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
  • 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
  • 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
  • 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
  • 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
  • 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
  • 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
  • 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
  • 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
  • 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
  • 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
  • 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
  • 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
  • 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
  • 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
  • 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
  • 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
  • 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
  • 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
  • 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
  • 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
  • 00:50 zabe@deploy2002: Started scap: update interwiki cache
  • 00:47 zabe@deploy2002: Finished scap: T332115 (duration: 06m 56s)
  • 00:40 zabe@deploy2002: Started scap: T332115
  • 00:40 zabe: create Wikipedia Angika (anpwiki) # T332115
  • 00:38 zabe@deploy2002: Finished scap: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118) (duration: 27m 00s)
  • 00:29 zabe@deploy2002: zabe: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 00:11 zabe@deploy2002: Started scap: Backport for Add namespace translations for Angika (T332118), Add namespace translations for Angika (T332118), Add namespaces, linktrail and digit transform table for Angika (T332118)

2023-03-21

  • 23:46 zabe@deploy2002: Finished scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) (duration: 30m 08s)
  • 23:35 zabe@deploy2002: zabe: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:15 zabe@deploy2002: Started scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)
  • 23:07 zabe@deploy2002: Finished scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers" (duration: 07m 10s)
  • 23:00 zabe@deploy2002: Started scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers"
  • 22:47 ejegg: payments-wiki upgraded from 0fd66b1f to ab0a55a2
  • 22:10 urbanecm@deploy2002: Finished scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) (duration: 07m 15s)
  • 22:04 urbanecm@deploy2002: urbanecm: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:03 urbanecm@deploy2002: Started scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)
  • 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 21:02 AndyRussG: update SmashPig config 6e651fd4 -> 035f602a
  • 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 20:48 taavi: start T315510 migration script on group2 s7 wikis
  • 20:39 taavi@deploy2002: Finished scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config (duration: 09m 01s)
  • 20:31 taavi@deploy2002: matmarex and taavi: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 taavi@deploy2002: Started scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config
  • 20:20 taavi@deploy2002: Finished scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) (duration: 17m 40s)
  • 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:02 taavi@deploy2002: Started scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)
  • 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
  • 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
  • 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
  • 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
  • 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
  • 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
  • 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
  • 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1 refs T330207
  • 18:00 AndyRussG: update SmashPig config 59a8b2d2 -> 6e651fd
  • 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
  • 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
  • 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
  • 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
  • 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
  • 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
  • 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
  • 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
  • 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
  • 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
  • 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
  • 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
  • 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
  • 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
  • 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
  • 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
  • 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:42 jbond: stop puppet from deploying this further
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
  • 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
  • 15:26 samtar@deploy2002: Finished scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) (duration: 09m 11s)
  • 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:19 samtar@deploy2002: samtar: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:17 samtar@deploy2002: Started scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)
  • 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:10 samtar@deploy2002: Finished scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) (duration: 09m 32s)
  • 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:02 samtar@deploy2002: samtar: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 15:00 samtar@deploy2002: Started scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)
  • 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
  • 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
  • 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 14:38 hnowlan: disabling puppet on maps* before merging 760619
  • 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
  • 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
  • 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:10 urbanecm@deploy2002: Finished scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443) (duration: 07m 53s)
  • 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:02 urbanecm@deploy2002: Started scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)
  • 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - T319372
  • 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
  • 11:08 joal: Kill mediacounts_load oozie job
  • 11:07 joal: Unpause mediawiki_history_denormalize airflow job
  • 11:06 joal: Kill mediawiki_denormalize oozie job
  • 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
  • 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
  • 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
  • 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
  • 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
  • 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
  • 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
  • 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
  • 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
  • 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
  • 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
  • 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372
  • 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
  • 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
  • 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
  • 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1 refs T330207 (duration: 52m 38s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1 refs T330207

2023-03-20

  • 22:00 samtar@deploy2002: Finished scap: Backport for Add languages to Minerva HTML (T331905) (duration: 09m 45s)
  • 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for Add languages to Minerva HTML (T331905) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:50 samtar@deploy2002: Started scap: Backport for Add languages to Minerva HTML (T331905)
  • 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` T332614
  • 21:25 TheresNoTime: closing UTC late backport window, extended
  • 21:22 samtar@deploy2002: Finished scap: Backport for Rename project and project talk namespace for shwiki (T332614) (duration: 12m 22s)
  • 21:11 samtar@deploy2002: samtar and aleksandar: Backport for Rename project and project talk namespace for shwiki (T332614) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:10 samtar@deploy2002: Started scap: Backport for Rename project and project talk namespace for shwiki (T332614)
  • 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
  • 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
  • 21:09 samtar@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) (duration: 08m 34s)
  • 21:02 samtar@deploy2002: matmarex and samtar: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:00 TheresNoTime: extending UTC late backport window
  • 21:00 samtar@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407)
  • 20:58 kharlan@deploy2002: Finished scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) (duration: 10m 28s)
  • 20:49 kharlan@deploy2002: kharlan: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
  • 20:47 kharlan@deploy2002: Started scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309)
  • 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) T257317 T332623 T331896
  • 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
  • 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
  • 18:56 ejegg: switched back to new PayPal pending transaction resolver
  • 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
  • 18:47 akosiaris: emergency rollover of redis password complete
  • 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
  • 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
  • 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
  • 18:42 ejegg: civicrm upgraded from 3d3606f1 to 09373b9d
  • 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost (T331896)
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
  • 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
  • 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP (T196968)
  • 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
  • 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
  • 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:11 TheresNoTime: close UTC afternoon backport window
  • 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
  • 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
  • 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` T331762
  • 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` T331762
  • 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` T332351
  • 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` T331762
  • 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") T331762
  • 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` T331762
  • 13:58 samtar@deploy2002: Finished scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) (duration: 09m 44s)
  • 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:49 samtar@deploy2002: Started scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762)
  • 13:47 samtar@deploy2002: Finished scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) (duration: 09m 26s)
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
  • 13:39 samtar@deploy2002: aleksandar and samtar: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:38 samtar@deploy2002: Started scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468)
  • 13:37 samtar@deploy2002: Finished scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) (duration: 08m 46s)
  • 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
  • 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
  • 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
  • 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
  • 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843 (duration: 01m 30s)
  • 13:29 samtar@deploy2002: stang and samtar: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
  • 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843
  • 13:28 samtar@deploy2002: Started scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439)
  • 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843 (duration: 01m 39s)
  • 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
  • 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843
  • 13:18 samtar@deploy2002: Finished scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) (duration: 11m 36s)
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
  • 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:08 samtar@deploy2002: stang and samtar: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 samtar@deploy2002: Started scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351)
  • 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
  • 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
  • 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
  • 09:21 claime: Repooling parse2004 - T332119
  • 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
  • 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
  • 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915

2023-03-19

  • 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) 27a5b481 -> 6359222d
  • 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
  • 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) 5dd37c9c -> 3d3606f1
  • 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
  • 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
  • 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)

2023-03-18

  • 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 14:26 apergos: rsync of xmldata public dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
  • 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
  • 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
  • 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 01:21 urandom: powercycling restbase2025 — T332462
  • 00:06 AndyRussG: Updating civicrm from 5dd37c9c to 3d3606f1

2023-03-17

  • 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
  • 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
  • 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
  • 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
  • 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
  • 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
  • 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
  • 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
  • 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
  • 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
  • 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
  • 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
  • 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
  • 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - T332119
  • 12:06 mutante: systemct-reset failed on gitlab-runner*
  • 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
  • 02:10 ejegg: civicrm upgraded from 672950d9 to 5dd37c9c
  • 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
  • 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
  • 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
  • 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
  • 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
  • 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
  • 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates

2023-03-16

  • 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
  • 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
  • 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
  • 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
  • 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
  • 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
  • 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
  • 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
  • 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
  • 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
  • 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
  • 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
  • 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
  • 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
  • 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
  • 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
  • 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
  • 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
  • 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
  • 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
  • 22:24 ejegg: civicrm upgraded from 68fa85cf to 672950d9
  • 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27 refs T330205
  • 20:36 brennen: 1.40.0-wmf.27 train (T330205): blockers hopefully resolved, rolling to all wikis
  • 20:35 TheresNoTime: close UTC late backport window
  • 20:35 samtar@deploy2002: Finished scap: Backport for Remove sampling from breadCrumbs schema (duration: 08m 18s)
  • 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for Remove sampling from breadCrumbs schema synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:26 samtar@deploy2002: Started scap: Backport for Remove sampling from breadCrumbs schema
  • 20:21 brennen@deploy2002: Finished scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) (duration: 09m 06s)
  • 20:14 brennen@deploy2002: brennen and jforrester: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:12 brennen@deploy2002: Started scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)
  • 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
  • 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
  • 18:41 wfan: enable monthlyconvert for cz
  • 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
  • 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
  • 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
  • 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
  • 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
  • 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
  • 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
  • 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
  • 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
  • 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
  • 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - T326363
  • 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
  • 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
  • 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
  • 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
  • 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
  • 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - T326363
  • 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
  • 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
  • 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
  • 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
  • 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
  • 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
  • 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
  • 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:06 urandom: ALTER-ing image_suggestions.suggestion table — T328670
  • 13:35 kostajh: UTC afternoon deploys done
  • 13:34 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag (duration: 07m 44s)
  • 13:28 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:27 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag
  • 13:15 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) (duration: 09m 48s)
  • 13:07 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:05 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813)
  • 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
  • 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
  • 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
  • 08:40 kostajh: UTC morning deploys (second round) done
  • 08:40 kharlan@deploy2002: Finished scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) (duration: 12m 30s)
  • 08:29 kharlan@deploy2002: kharlan: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:27 kharlan@deploy2002: Started scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227)
  • 08:11 apergos: additional deployments for the UTC morning backport and config training window, running into the next hour, so window re-opened
  • 07:36 tgr_: UTC morning deploys done
  • 07:34 tgr@deploy2002: Finished scap: Backport for Leveling up: Backport recent changes (duration: 08m 13s)
  • 07:28 tgr@deploy2002: tgr: Backport for Leveling up: Backport recent changes synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:26 tgr@deploy2002: Started scap: Backport for Leveling up: Backport recent changes
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl T331874', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
  • 06:03 marostegui: Failover m5 from db1106 to db1176 - T332155
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T332155
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T332155
  • 03:29 ejegg: payments-wiki upgraded from 1532b107 to 0fd66b1f

2023-03-15

  • 22:55 tzatziki: Removing 1 file for legal compliance
  • 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 55s)
  • 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
  • 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 28s)
  • 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
  • 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
  • 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
  • 21:59 brennen: end of phabricator update window (T331915)
  • 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 40s)
  • 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
  • 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 28s)
  • 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
  • 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915) (duration: 00m 52s)
  • 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915)
  • 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
  • 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
  • 21:13 mutante: phab* - upgrading PHP packages
  • 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
  • 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
  • 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
  • 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915) (duration: 00m 31s)
  • 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915)
  • 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
  • 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
  • 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
  • 20:48 TheresNoTime: close UTC late backport window
  • 20:48 samtar@deploy2002: Finished scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki (duration: 08m 46s)
  • 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:39 samtar@deploy2002: Started scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki
  • 20:35 samtar@deploy2002: Finished scap: Backport for Deploy action blocks on itwiki (T330533) (duration: 10m 30s)
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
  • 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for Deploy action blocks on itwiki (T330533) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:25 samtar@deploy2002: Started scap: Backport for Deploy action blocks on itwiki (T330533)
  • 20:23 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) (duration: 10m 12s)
  • 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
  • 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
  • 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 20:15 samtar@deploy2002: sgimeno and samtar: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:13 samtar@deploy2002: Started scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)
  • 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
  • 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
  • 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
  • 20:11 taavi: deploy patch for T331192
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
  • 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
  • 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
  • 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
  • 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 19:49 taavi@deploy2002: Finished scap: Backport for extdist: Add REL1_40 (T329085) (duration: 12m 04s)
  • 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
  • 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
  • 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
  • 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
  • 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
  • 19:39 taavi@deploy2002: taavi: Backport for extdist: Add REL1_40 (T329085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:37 taavi@deploy2002: Started scap: Backport for extdist: Add REL1_40 (T329085)
  • 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
  • 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
  • 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
  • 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
  • 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
  • 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
  • 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
  • 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
  • 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
  • 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
  • 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. (T332115)
  • 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
  • 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27 refs T330205 (duration: 06m 08s)
  • 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
  • 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27 refs T330205
  • 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
  • 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
  • 18:06 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group1.
  • 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
  • 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
  • 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
  • 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
  • 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
  • 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
  • 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
  • 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
  • 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
  • 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
  • 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
  • 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
  • 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
  • 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
  • 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
  • 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 16:02 hnowlan: restarted thumbor-instances on thumbor1006
  • 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
  • 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:53 claime: Redeploying mw-on-k8s for php7.4 update T330270
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
  • 14:41 claime: Rebuilding mw-on-k8s images - T330270
  • 14:38 claime: Updating php7.4 production images
  • 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 14:24 daniel@deploy2002: Finished scap: Backport for Always write parsoid output to parser cache. (T320534) (duration: 09m 57s)
  • 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
  • 14:22 jbond: switch pki to be active active
  • 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 14:19 jbond: update pki to use discovery record
  • 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
  • 14:15 daniel@deploy2002: daniel: Backport for Always write parsoid output to parser cache. (T320534) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
  • 14:14 daniel@deploy2002: Started scap: Backport for Always write parsoid output to parser cache. (T320534)
  • 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: T321309
  • 14:12 sukhe: depool dns4002 for reimaging to bullseye: T321309
  • 14:00 moritzm: nodejs security updates on buster
  • 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
  • 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: T321309
  • 13:49 moritzm: installing graphite-web security updates
  • 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:17 taavi@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) (duration: 09m 01s)
  • 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
  • 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
  • 13:08 taavi@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635)
  • 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:18 marostegui: Failover m5 from db1176 to db1106 - T331877
  • 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T331877
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T331877
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 11:20 moritzm: imported packages into thirdparty/ceph-quincy
  • 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - T331268/25
  • 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
  • 09:22 moritzm: installing gnutls28 security updates
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T331875', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
  • 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199 (duration: 00m 19s)
  • 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199
  • 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 07:40 tgr_: UTC morning deploys done
  • 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
  • 07:36 tgr@deploy2002: Finished scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet (duration: 07m 54s)
  • 07:30 tgr@deploy2002: tgr: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:28 tgr@deploy2002: Started scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) T331874', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
  • 06:20 marostegui: Remove pki2001 from m1 grants T332018

2023-03-14

  • 23:29 brennen@deploy2002: Finished scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205) (duration: 10m 32s)
  • 23:20 brennen@deploy2002: brennen and umherirrender: Backport for action: Restrict action.delete.js to action=delete pages (T330205) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:19 brennen@deploy2002: Started scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205)
  • 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:43 ejegg: payments-wiki upgraded from 61c30a4f to 1532b107
  • 20:35 zabe@deploy2002: Finished scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) (duration: 08m 36s)
  • 20:28 zabe@deploy2002: zabe: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:27 zabe@deploy2002: Started scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921)
  • 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version T327919
  • 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 19:30 brennen: 1.40.0-wmf.27 train (T330205): uneventful at group0. i'm afk for about an hour.
  • 19:13 ejegg: civicrm upgraded from dbe3b716 to 68fa85cf
  • 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
  • 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
  • 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
  • 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27 refs T330205
  • 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
  • 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:03 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group0.
  • 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
  • 16:47 sukhe: rolling restart of pdns-rec to pick up config changes
  • 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
  • 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
  • 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
  • 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
  • 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
  • 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
  • 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
  • 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
  • 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
  • 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - T331541
  • 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
  • 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
  • 14:00 jbond: reimage pki1001
  • 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
  • 13:27 TheresNoTime: close UTC afternoon backport window
  • 13:26 samtar@deploy2002: Finished scap: Backport for arwiki: Add new throttle rule (T331973) (duration: 07m 24s)
  • 13:20 samtar@deploy2002: samtar and urbanecm: Backport for arwiki: Add new throttle rule (T331973) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:19 samtar@deploy2002: Started scap: Backport for arwiki: Add new throttle rule (T331973)
  • 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
  • 13:18 samtar@deploy2002: Finished scap: Backport for Enable VE on more namespaces on foundationwiki (T331079) (duration: 07m 55s)
  • 13:11 samtar@deploy2002: esanders and samtar: Backport for Enable VE on more namespaces on foundationwiki (T331079) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:10 samtar@deploy2002: Started scap: Backport for Enable VE on more namespaces on foundationwiki (T331079)
  • 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
  • 12:23 moritzm: installing git security updates
  • 12:20 samtar@deploy2002: Finished scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) (duration: 09m 12s)
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
  • 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of T297396 + T331680 — scap rolled back
  • 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
  • 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: T331541
  • 12:13 samtar@deploy2002: samtar and varnent: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 12:11 samtar@deploy2002: Started scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680)
  • 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
  • 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
  • 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: T331541
  • 12:06 claime: Unlocked scap deployments - T331541
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
  • 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: T331541
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
  • 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
  • 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: T331541
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
  • 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
  • 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
  • 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
  • 11:13 claime: We are encountering unexpected DNS anycast issued following T331541, latencies are increased but no production outage.
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
  • 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
  • 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T331541
  • 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T331541
  • 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
  • 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
  • 10:42 jbond: reimage pki-root1001
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
  • 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
  • 10:32 claime: Repooling all active/active services in eqiad - T331541
  • 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - T331541
  • 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
  • 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
  • 10:15 jayme: enabling puppet on P:calico::kubernetes for T325268
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
  • 10:02 claime: Locking scap deployment for service switchover - T331541
  • 10:00 claime: Locking scap deployment for service switchover - T330651
  • 09:56 jayme: disabling puppet on P:calico::kubernetes for T325268
  • 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
  • 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 moritzm: installing NSS security updates
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
  • 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:23 Emperor: reboot ms-be2040 T331860
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
  • 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
  • 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
  • 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 T322294
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
  • 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
  • 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
  • 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122
  • 04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27 refs T330205 (duration: 51m 02s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27 refs T330205
  • 02:22 legoktm: removed user's 2FA on wikitech for T331955
  • 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json
  • 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json
  • 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json
  • 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json
  • 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json
  • 01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json
  • 01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json
  • 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json
  • 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json
  • 00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json
  • 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json

2023-03-13

  • 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json
  • 23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json
  • 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json
  • 23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json
  • 22:48 zabe@deploy2002: Finished scap: noc: Switch default selection on db.php from eqiad to codfw (duration: 06m 56s)
  • 22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json
  • 22:41 zabe@deploy2002: Started scap: noc: Switch default selection on db.php from eqiad to codfw
  • 22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s)
  • {{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}}
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json
  • 22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json
  • 22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s)
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json
  • 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json
  • 21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json
  • 21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye
  • 21:21 wfan: remove -d for jobs-dlocal queue runner
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json
  • 21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json
  • 21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 21:01 ejegg: enabled jobs-dlocal queue runner
  • 21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json
  • 20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye
  • 20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s)
  • 20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json
  • 20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json
  • 20:27 kindrobot: close UTC late backport window
  • 20:26 kindrobot@deploy2002: Finished scap: Backport for Add header at top of main page (T325362) (duration: 12m 11s)
  • 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json
  • 20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for Add header at top of main page (T325362) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:15 kindrobot: start UTC late backport window
  • 20:14 kindrobot@deploy2002: Started scap: Backport for Add header at top of main page (T325362)
  • 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json
  • 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json
  • 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json
  • 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json
  • 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json
  • 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json
  • 18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45796 and previous config saved to /var/cache/conftool/dbconfig/20230313-184502-marostegui.json
  • 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable (duration: 00m 13s)
  • 18:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable
  • 18:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:36 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date (duration: 00m 14s)
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45795 and previous config saved to /var/cache/conftool/dbconfig/20230313-183628-marostegui.json
  • 18:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45794 and previous config saved to /var/cache/conftool/dbconfig/20230313-182121-marostegui.json
  • 18:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 18:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45793 and previous config saved to /var/cache/conftool/dbconfig/20230313-180615-marostegui.json
  • 17:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 17:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45792 and previous config saved to /var/cache/conftool/dbconfig/20230313-175109-marostegui.json
  • 17:50 dancy@deploy2002: Finished scap: test cleanup (duration: 06m 40s)
  • 17:44 dancy@deploy2002: Started scap: test cleanup
  • 17:43 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45791 and previous config saved to /var/cache/conftool/dbconfig/20230313-174030-marostegui.json
  • 17:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45790 and previous config saved to /var/cache/conftool/dbconfig/20230313-174009-marostegui.json
  • 17:35 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45789 and previous config saved to /var/cache/conftool/dbconfig/20230313-172503-marostegui.json
  • 17:22 dancy@deploy2002: Finished scap: testing T329857 (duration: 06m 54s)
  • 17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 17:15 dancy@deploy2002: Started scap: testing T329857
  • 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:12 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 17:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:10 Emperor: roll-restart of codfw eqiad frontends
  • 17:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45788 and previous config saved to /var/cache/conftool/dbconfig/20230313-170955-marostegui.json
  • 17:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:08 dancy@deploy2002: Installation of scap version "4.46.0" completed for 553 hosts
  • 17:07 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
  • 17:04 bd808: Ran cache.purge_openstack_users() for Striker following deploy of e1f7491 (T331674)
  • 17:04 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45787 and previous config saved to /var/cache/conftool/dbconfig/20230313-165449-marostegui.json
  • 16:47 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45785 and previous config saved to /var/cache/conftool/dbconfig/20230313-164410-marostegui.json
  • 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45784 and previous config saved to /var/cache/conftool/dbconfig/20230313-164349-marostegui.json
  • 16:36 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45783 and previous config saved to /var/cache/conftool/dbconfig/20230313-162843-marostegui.json
  • 16:20 moritzm: imported tideways 5.0.4-2+wmf1+buster1+icu67u1 T329491
  • 16:18 dancy@deploy2002: Finished scap: testing (duration: 06m 53s)
  • 16:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 16:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 16:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45782 and previous config saved to /var/cache/conftool/dbconfig/20230313-161337-marostegui.json
  • 16:11 dancy@deploy2002: Started scap: testing
  • 16:06 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 15s)
  • 16:00 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+buster1+icu67u1 T329491
  • 16:00 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 43s)
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45781 and previous config saved to /var/cache/conftool/dbconfig/20230313-155830-marostegui.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45780 and previous config saved to /var/cache/conftool/dbconfig/20230313-154641-marostegui.json
  • 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:35 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1+icu67u1 T329491
  • 15:31 dancy@deploy2002: Finished scap: testing T329857 (duration: 10m 08s)
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:21 dancy@deploy2002: Started scap: testing T329857
  • 15:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45779 and previous config saved to /var/cache/conftool/dbconfig/20230313-150523-marostegui.json
  • 15:03 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 14:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45778 and previous config saved to /var/cache/conftool/dbconfig/20230313-145016-marostegui.json
  • 14:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:38 jbond: disable puppet fleet wide to debug strange issue
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45777 and previous config saved to /var/cache/conftool/dbconfig/20230313-143510-marostegui.json
  • 14:23 claime: switch noc.wikimedia.org from eqiad to codfw - T331634
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45776 and previous config saved to /var/cache/conftool/dbconfig/20230313-142004-marostegui.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45774 and previous config saved to /var/cache/conftool/dbconfig/20230313-141409-marostegui.json
  • 14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45773 and previous config saved to /var/cache/conftool/dbconfig/20230313-141348-marostegui.json
  • 14:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45772 and previous config saved to /var/cache/conftool/dbconfig/20230313-135842-marostegui.json
  • 13:50 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 13:49 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 13:48 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 13:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6] (duration: 00m 11s)
  • 13:48 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6]
  • 13:47 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 13:46 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 13:45 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45770 and previous config saved to /var/cache/conftool/dbconfig/20230313-134336-marostegui.json
  • 13:40 moritzm: imported wikidiff2 1.13.0-1+wmf1+buster1+icu67u1 T329491
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45769 and previous config saved to /var/cache/conftool/dbconfig/20230313-132829-marostegui.json
  • 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1 T329491
  • 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1T329491
  • 13:23 taavi@deploy2002: Finished scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) (duration: 08m 10s)
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45768 and previous config saved to /var/cache/conftool/dbconfig/20230313-132123-marostegui.json
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45767 and previous config saved to /var/cache/conftool/dbconfig/20230313-132101-marostegui.json
  • 13:16 taavi@deploy2002: taavi and superpes: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:15 taavi@deploy2002: Started scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047)
  • 13:13 taavi@deploy2002: Finished scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) (duration: 09m 29s)
  • 13:11 moritzm: imported php-luasandbox 4.0.2-3+wmf1+buster1+icu67u1 T329491
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45766 and previous config saved to /var/cache/conftool/dbconfig/20230313-130555-marostegui.json
  • 13:05 taavi@deploy2002: stang and taavi: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 taavi@deploy2002: Started scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691)
  • 13:00 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+buster1+icu67u1 T329491
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45764 and previous config saved to /var/cache/conftool/dbconfig/20230313-125049-marostegui.json
  • 12:48 hnowlan: restarting codfw thumbor instances to attempt to remedy 502 issues
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.codfw.wmnet
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.codfw.wmnet
  • 12:37 moritzm: imported php-geoip 1.1.1-7+wmf2+buster1+icu67u1 T329491
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45763 and previous config saved to /var/cache/conftool/dbconfig/20230313-123543-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45762 and previous config saved to /var/cache/conftool/dbconfig/20230313-122928-marostegui.json
  • 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45761 and previous config saved to /var/cache/conftool/dbconfig/20230313-122906-marostegui.json
  • 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:19 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf1+buster1+icu67u1 T329491
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45760 and previous config saved to /var/cache/conftool/dbconfig/20230313-121400-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45759 and previous config saved to /var/cache/conftool/dbconfig/20230313-115854-marostegui.json
  • 11:58 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+buster1+icu67u1 T329491
  • 11:46 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+buster1+icu67u1 T329491
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45758 and previous config saved to /var/cache/conftool/dbconfig/20230313-114348-marostegui.json
  • 11:31 moritzm: imported php-apcu 5.1.19+4.0.11-3+wmf2+buster1+icu67u1 T329491
  • 11:22 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
  • 11:21 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
  • 11:11 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+buster1+icu67u1 T329491
  • 10:55 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+buster1+icu67u1 T329491
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45757 and previous config saved to /var/cache/conftool/dbconfig/20230313-104322-marostegui.json
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45756 and previous config saved to /var/cache/conftool/dbconfig/20230313-104246-marostegui.json
  • 10:38 moritzm: imported php-pcov 1.0.6-4+wmf1~buster1+icu67u1 T329491
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45755 and previous config saved to /var/cache/conftool/dbconfig/20230313-102740-marostegui.json
  • 10:26 moritzm: imported php-defaults 7.4+76+wmf1~buster2+icu67u1 T329491
  • 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55701
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45754 and previous config saved to /var/cache/conftool/dbconfig/20230313-101234-marostegui.json
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55701
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38193
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38193
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46632
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46632
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6663
  • 10:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6663
  • 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45558
  • 10:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45558
  • 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
  • 10:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
  • 10:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 668
  • 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 668
  • 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:02 moritzm: imported dh-php 0.35+wmf1+buster1+icu67u1 T329491
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45753 and previous config saved to /var/cache/conftool/dbconfig/20230313-095728-marostegui.json
  • 09:55 vgutierrez: Enable haproxy hardening in cp hosts globally - T323944
  • 09:52 zabe@deploy2002: Finished scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] (duration: 07m 40s)
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45752 and previous config saved to /var/cache/conftool/dbconfig/20230313-095119-marostegui.json
  • 09:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45751 and previous config saved to /var/cache/conftool/dbconfig/20230313-095058-marostegui.json
  • 09:48 jayme: pcc-worker1003:~# rm -r /srv/jenkins/puppet-compiler/40076 - / back to 70%
  • 09:46 zabe@deploy2002: jforrester and zabe: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 09:45 jayme: pcc-worker1002:~# rm -r /srv/jenkins/puppet-compiler/40078 - / back to 47% usage
  • 09:44 zabe@deploy2002: Started scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply]
  • 09:44 zabe@deploy2002: Finished scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) (duration: 07m 52s)
  • 09:40 jayme: pcc-worker1001:~# rm -r /srv/jenkins/puppet-compiler/40079 /srv/jenkins/puppet-compiler/38943 - / back to 68% usage
  • 09:38 zabe@deploy2002: zabe: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:36 zabe@deploy2002: Started scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45750 and previous config saved to /var/cache/conftool/dbconfig/20230313-093552-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45749 and previous config saved to /var/cache/conftool/dbconfig/20230313-092045-marostegui.json
  • 09:16 moritzm: installing python-werkzeug security updates
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45748 and previous config saved to /var/cache/conftool/dbconfig/20230313-090539-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45747 and previous config saved to /var/cache/conftool/dbconfig/20230313-085937-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45746 and previous config saved to /var/cache/conftool/dbconfig/20230313-085916-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45745 and previous config saved to /var/cache/conftool/dbconfig/20230313-084409-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45744 and previous config saved to /var/cache/conftool/dbconfig/20230313-082903-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45743 and previous config saved to /var/cache/conftool/dbconfig/20230313-081357-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45742 and previous config saved to /var/cache/conftool/dbconfig/20230313-080759-marostegui.json
  • 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45741 and previous config saved to /var/cache/conftool/dbconfig/20230313-080738-marostegui.json
  • 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:02 moritzm: installing curl security updates
  • 07:58 zabe@deploy2002: Finished scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) (duration: 07m 02s)
  • 07:53 zabe@deploy2002: zabe: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45740 and previous config saved to /var/cache/conftool/dbconfig/20230313-075232-marostegui.json
  • 07:51 zabe@deploy2002: Started scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482)
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45739 and previous config saved to /var/cache/conftool/dbconfig/20230313-073725-marostegui.json
  • 07:37 marostegui: Remove pagetriage_log from enwiki T328309
  • 07:32 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) (duration: 17m 04s)
  • 07:25 kartik@deploy2002: kartik: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45738 and previous config saved to /var/cache/conftool/dbconfig/20230313-072219-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45737 and previous config saved to /var/cache/conftool/dbconfig/20230313-071522-marostegui.json
  • 07:15 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)
  • 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45736 and previous config saved to /var/cache/conftool/dbconfig/20230313-071501-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45735 and previous config saved to /var/cache/conftool/dbconfig/20230313-065954-marostegui.json
  • 06:52 marostegui_: Remove pagetriage_log from testwiki and test2wiki T328309
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45734 and previous config saved to /var/cache/conftool/dbconfig/20230313-064448-marostegui.json
  • 06:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9873
  • 06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9873
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9507
  • 06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9507
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15830
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15830
  • 06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9902
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9902
  • 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45733 and previous config saved to /var/cache/conftool/dbconfig/20230313-062942-marostegui.json
  • 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34549
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34549
  • 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
  • 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 29357
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45732 and previous config saved to /var/cache/conftool/dbconfig/20230313-062244-marostegui.json
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138886
  • 06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138886
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui_: Deploy schema change on s3 codfw dbmaint T329684
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:37 kart_: Updated cxserver to 2023-03-09-061555-production (T331097, T327102, T326541)
  • 04:19 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 04:19 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 04:17 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 04:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-03-12

  • 10:47 elukey: reset offsets on kafka jumbo for benthos webrequest live (as indicated in https://phabricator.wikimedia.org/T331801#8685569)
  • 07:50 elukey: restart benthos-webrequest-live on centrallog1002 - T331801
  • 07:49 elukey: restart benthos-webrequest-live on centrallog2002 - T331801
  • 07:49 elukey: stop and mask benthos-webrequest-live on centrallog1001 - T331801

2023-03-10

  • 22:43 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:32 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:26 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:16 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:24 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:14 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:13 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:03 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:43 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78] (duration: 00m 10s)
  • 20:43 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78]
  • 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:39 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 00m 09s)
  • 19:38 milimetric@deploy2002: Started deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942]
  • 19:38 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 08m 08s)
  • 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 milimetric@deploy2002: Started deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942]
  • 19:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
  • 19:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 19:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:55 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944] (duration: 00m 12s)
  • 18:55 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944]
  • 18:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 18:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 18:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 18:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 18:12 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 18:04 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:53 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:52 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:40 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:28 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 16:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 16:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 16:04 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2003-dev']
  • 16:04 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:59 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:59 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:57 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2003-dev']
  • 15:53 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:53 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:50 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:52 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:47 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:38 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:36 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:22 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:20 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:08 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
  • 13:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
  • 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:34 Emperor: restart swift-object-replicator on ms-be2067
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
  • 12:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
  • 12:46 moritzm: installing libsdl2 security updates
  • 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:31 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader1004.wikimedia.org with OS bullseye
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
  • 11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
  • 11:35 moritzm: instaling isc-dhcp bugfix updates from DLA 3326
  • 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1004.wikimedia.org with OS bullseye
  • 11:04 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=jawiki --logwiki=metawiki --ignorestatus 'あ ーあーあーあーあー' 'ARIAUSO' # T331685
  • 11:03 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ZSTK Lublin' 'Sonabet4' # T331685
  • 11:01 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Yair.herman' 'Manor258' # T331685
  • 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Tranquill Komnin' 'Nevechear' # T331685
  • 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Tosikuni Japan' 'Revisionist14' # T331685
  • 10:54 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Studio 7 Piaseczno Jarosław Zawadzki' 'Jarosław Andrzej Zawadzki (muzyk)' # T331685
  • 10:52 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Siniy7' 'Viktorbublik' # T331685
  • 10:51 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki --ignorestatus 'Reza amjad(iran)' 'رضا امجد (تبریز)' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Mac700' 'Unknown001100' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'HonzaSTECH' 'ShadyMedic' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
  • 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mac700' 'Unknown001100' # T331685
  • 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'HonzaSTECH' 'ShadyMedic' # T331685
  • 10:40 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
  • 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
  • 09:57 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
  • 09:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 02:09 zabe@deploy2002: Finished scap: T331685 (duration: 07m 52s)
  • 02:02 zabe@deploy2002: Started scap: T331685
  • 02:01 zabe@deploy2002: Finished scap: T331685 (duration: 07m 28s)
  • 02:00 ejegg: SmashPig upgraded from c6775c60 to 3b84e4cb
  • 01:55 ejegg: payments-wiki upgraded from 05a5e09a to 61c30a4f
  • 01:54 zabe@deploy2002: Started scap: T331685
  • 01:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye

2023-03-09

  • 23:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting (duration: 00m 14s)
  • 23:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting
  • 23:33 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor (duration: 00m 14s)
  • 23:32 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor
  • 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 23:01 sukhe: pool new dns hosts dns1003 and dns2003: T330670
  • 22:53 sukhe: run homer in cr*-{codfw,eqiad} for CR 896190
  • 22:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2003.wikimedia.org with OS bullseye
  • 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:41 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:40 bd808: Forced puppet run on cloudweb100[34] to apply quick fix for T331674
  • 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
  • 22:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
  • 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1003.wikimedia.org with OS bullseye
  • 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
  • 22:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
  • 22:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
  • 22:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns2003.wikimedia.org with OS bullseye
  • 21:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
  • 21:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
  • 21:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 21:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
  • 21:38 TheresNoTime: close UTC late backport
  • 21:37 samtar@deploy2002: Finished scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) (duration: 10m 43s)
  • 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 21:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 21:28 samtar@deploy2002: samtar and nray: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:27 samtar@deploy2002: Started scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)
  • 21:24 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part II of II (duration: 07m 38s)
  • 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
  • 21:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
  • 21:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
  • 21:18 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part II of II synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:17 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part II of II
  • 21:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:14 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part I of II (duration: 12m 19s)
  • 21:10 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns2003
  • 21:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 21:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns2003
  • 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
  • 21:08 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
  • 21:07 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 21:03 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part I of II synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.mgmt.codfw.wmnet on all recursors
  • 21:02 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.mgmt.codfw.wmnet on all recursors
  • 21:02 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part I of II
  • 20:59 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.wikimedia.org on all recursors
  • 20:59 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.wikimedia.org on all recursors
  • 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
  • 20:46 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
  • 20:44 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 20:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1003.wikimedia.org']
  • 20:30 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1003.wikimedia.org']
  • 20:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 20:24 topranks: move cloud-hosts1-b-codfw GW from core routers to cloudsw1-b1-codfw T327919
  • 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 20:12 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns1003.wikimedia.org on all recursors
  • 20:12 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns1003.wikimedia.org on all recursors
  • 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 20:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 20:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
  • 19:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
  • 19:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
  • 19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
  • 19:15 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
  • 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 19:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.26 refs T330204
  • 19:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:53 sukhe: enable puppet on A:dns-rec and force puppet run: T330670
  • 18:50 mforns@deploy2002: Finished deploy [airflow-dags/analytics@3419b7d]: (no justification provided) (duration: 00m 10s)
  • 18:50 mforns@deploy2002: Started deploy [airflow-dags/analytics@3419b7d]: (no justification provided)
  • 18:47 sukhe: enable puppet on dns4003 to merge 895894
  • 18:44 sukhe: disable puppet on A:dns-rec to merge CR 895894
  • 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:34 sukhe: [correction] homer "cr*-codfw*" commit "Remove authdns2001 from homer, T330670"
  • 18:34 sukhe: homer "cr*-codfw*" commit "Remove authdns1001 from homer, T330670"
  • 18:31 sukhe: homer "cr*-eqiad*" commit "Remove authdns1001 from homer, T330670"
  • 18:26 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts authdns[1001,2001].wikimedia.org
  • 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:24 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:22 sukhe: running puppet-agent on A:dns-auth to remove deprecated authdns[12]001
  • 18:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:15 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts authdns[1001,2001].wikimedia.org
  • 18:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:00 sukhe: cr*-codfw [ns0]: set routing-options static route 208.80.154.238/32 next-hop 208.80.153.77: T330670
  • 17:53 sukhe: cr*-codfw [ns1]: set routing-options static route 208.80.153.231/32 next-hop 208.80.153.77: T330670
  • 17:50 zabe@deploy2002: Finished scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) (duration: 11m 57s)
  • 17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45725 and previous config saved to /var/cache/conftool/dbconfig/20230309-174723-marostegui.json
  • 17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:42 sukhe: [ns1] set routing-options static route 208.80.153.231/32 next-hop 208.80.154.10: T330670
  • 17:39 zabe@deploy2002: zabe and ssastry: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:38 zabe@deploy2002: Started scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629)
  • 17:37 sukhe: cr2-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
  • 17:37 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
  • 17:36 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45724 and previous config saved to /var/cache/conftool/dbconfig/20230309-173217-marostegui.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45723 and previous config saved to /var/cache/conftool/dbconfig/20230309-171711-marostegui.json
  • 17:13 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (prod links) T327919
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45722 and previous config saved to /var/cache/conftool/dbconfig/20230309-170205-marostegui.json
  • 16:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45721 and previous config saved to /var/cache/conftool/dbconfig/20230309-165210-marostegui.json
  • 16:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45720 and previous config saved to /var/cache/conftool/dbconfig/20230309-165149-marostegui.json
  • 16:51 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (cloud vrf) T327919
  • 16:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45719 and previous config saved to /var/cache/conftool/dbconfig/20230309-163643-marostegui.json
  • 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45718 and previous config saved to /var/cache/conftool/dbconfig/20230309-162608-root.json
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45717 and previous config saved to /var/cache/conftool/dbconfig/20230309-162137-marostegui.json
  • 16:18 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief1001.eqiad.wmnet with OS bullseye
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45716 and previous config saved to /var/cache/conftool/dbconfig/20230309-161103-root.json
  • 16:09 zabe@deploy2002: Finished scap: T308932 (duration: 07m 19s)
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45715 and previous config saved to /var/cache/conftool/dbconfig/20230309-160630-marostegui.json
  • 16:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:03 marostegui: Restart mailman service T331626
  • 16:02 zabe@deploy2002: Started scap: T308932
  • 16:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:00 marostegui: Failover m5 from db1183 to db1176 - T330847
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45714 and previous config saved to /var/cache/conftool/dbconfig/20230309-155558-root.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45713 and previous config saved to /var/cache/conftool/dbconfig/20230309-155520-marostegui.json
  • 15:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45712 and previous config saved to /var/cache/conftool/dbconfig/20230309-155459-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45711 and previous config saved to /var/cache/conftool/dbconfig/20230309-154053-root.json
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45710 and previous config saved to /var/cache/conftool/dbconfig/20230309-153953-marostegui.json
  • 15:29 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief1001.eqiad.wmnet with OS bullseye
  • 15:27 brett: Enable puppet on R:acme_chief::cert - T321309
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45709 and previous config saved to /var/cache/conftool/dbconfig/20230309-152447-marostegui.json
  • 15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
  • 15:15 moritzm: installing PHP 7.3 security updates (as shipped in Debian)
  • 15:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
  • 15:14 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:13 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:11 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45707 and previous config saved to /var/cache/conftool/dbconfig/20230309-151100-marostegui.json
  • 15:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 15:10 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:10 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45706 and previous config saved to /var/cache/conftool/dbconfig/20230309-150940-marostegui.json
  • 15:06 brett: Disable puppet on R:acme_chief::cert for acmechief maintenance - T321309
  • 15:04 zabe@deploy2002: Finished scap: Backport for Drop unused FlaggedRevs threshold level names (T277883) (duration: 10m 48s)
  • 15:04 TheresNoTime: close UTC afternoon backport window
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
  • 15:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
  • 15:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:56 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:55 zabe@deploy2002: awight and zabe: Backport for Drop unused FlaggedRevs threshold level names (T277883) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:55 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:54 zabe@deploy2002: Started scap: Backport for Drop unused FlaggedRevs threshold level names (T277883)
  • 14:34 moritzm: installing apr security updates
  • 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 14:30 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f774711]: (no justification provided) (duration: 19m 03s)
  • 14:13 samtar@deploy2002: Finished scap: Backport for Bump parsoid parser cache writes to 50%. (T320534) (duration: 07m 28s)
  • 14:11 jgiannelos@deploy2002: Started deploy [restbase/deploy@f774711]: (no justification provided)
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45705 and previous config saved to /var/cache/conftool/dbconfig/20230309-140915-marostegui.json
  • 14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45704 and previous config saved to /var/cache/conftool/dbconfig/20230309-140850-marostegui.json
  • 14:08 Emperor: testing disk-swap in ms-be1066 T329305
  • 14:07 samtar@deploy2002: daniel and samtar: Backport for Bump parsoid parser cache writes to 50%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:05 samtar@deploy2002: Started scap: Backport for Bump parsoid parser cache writes to 50%. (T320534)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45703 and previous config saved to /var/cache/conftool/dbconfig/20230309-140510-marostegui.json
  • 14:00 aqu@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b] (duration: 00m 13s)
  • 14:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b]
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45702 and previous config saved to /var/cache/conftool/dbconfig/20230309-135343-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45701 and previous config saved to /var/cache/conftool/dbconfig/20230309-135004-marostegui.json
  • 13:42 moritzm: restarting FPM/Apache on mw canaries to pick up curl updates
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45700 and previous config saved to /var/cache/conftool/dbconfig/20230309-133837-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45699 and previous config saved to /var/cache/conftool/dbconfig/20230309-133458-marostegui.json
  • 13:34 moritzm: installing curl security updates
  • 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
  • 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45698 and previous config saved to /var/cache/conftool/dbconfig/20230309-132331-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45697 and previous config saved to /var/cache/conftool/dbconfig/20230309-131951-marostegui.json
  • 13:17 vgutierrez: rolling restart of pybal in lvs2009 and lvs2010
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45696 and previous config saved to /var/cache/conftool/dbconfig/20230309-131136-marostegui.json
  • 13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
  • 13:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:03 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45695 and previous config saved to /var/cache/conftool/dbconfig/20230309-130315-marostegui.json
  • 12:57 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=aqs,dc=codfw
  • 12:55 btullis@puppetmaster1001: conftool action : set/weight=10; selector: cluster=aqs,dc=codfw
  • 12:53 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs2001.codfw.wmnet
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45694 and previous config saved to /var/cache/conftool/dbconfig/20230309-124809-marostegui.json
  • 12:46 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45693 and previous config saved to /var/cache/conftool/dbconfig/20230309-124025-marostegui.json
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45692 and previous config saved to /var/cache/conftool/dbconfig/20230309-124004-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45691 and previous config saved to /var/cache/conftool/dbconfig/20230309-123303-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45690 and previous config saved to /var/cache/conftool/dbconfig/20230309-123015-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45689 and previous config saved to /var/cache/conftool/dbconfig/20230309-122458-marostegui.json
  • 12:22 moritzm: rebalancing ganeti eqiad/C after completion of bullseye updates T311687
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45688 and previous config saved to /var/cache/conftool/dbconfig/20230309-121756-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45687 and previous config saved to /var/cache/conftool/dbconfig/20230309-121510-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45686 and previous config saved to /var/cache/conftool/dbconfig/20230309-120951-marostegui.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45685 and previous config saved to /var/cache/conftool/dbconfig/20230309-120559-marostegui.json
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45684 and previous config saved to /var/cache/conftool/dbconfig/20230309-120537-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45683 and previous config saved to /var/cache/conftool/dbconfig/20230309-120005-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45682 and previous config saved to /var/cache/conftool/dbconfig/20230309-115445-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45681 and previous config saved to /var/cache/conftool/dbconfig/20230309-115031-marostegui.json
  • 11:47 marostegui: Deploy schema change on s1 codfw dbmaint T329684
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45680 and previous config saved to /var/cache/conftool/dbconfig/20230309-114500-root.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45679 and previous config saved to /var/cache/conftool/dbconfig/20230309-114338-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:40 moritzm: installing git security updates
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45678 and previous config saved to /var/cache/conftool/dbconfig/20230309-113525-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45677 and previous config saved to /var/cache/conftool/dbconfig/20230309-112804-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45676 and previous config saved to /var/cache/conftool/dbconfig/20230309-112739-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45675 and previous config saved to /var/cache/conftool/dbconfig/20230309-112019-marostegui.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45674 and previous config saved to /var/cache/conftool/dbconfig/20230309-111233-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45673 and previous config saved to /var/cache/conftool/dbconfig/20230309-110827-marostegui.json
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45672 and previous config saved to /var/cache/conftool/dbconfig/20230309-110806-marostegui.json
  • 11:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 11:01 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 11:00 otto@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Step 2b: InitialiseSettings.php - remove duplicate configs - T308932 (duration: 06m 37s)
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45671 and previous config saved to /var/cache/conftool/dbconfig/20230309-105726-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45670 and previous config saved to /var/cache/conftool/dbconfig/20230309-105259-marostegui.json
  • 10:50 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Step 2a: ext-EventLogging.php - remove duplicate configs - T308932 (duration: 06m 32s)
  • 10:47 topranks: Resetting PIC in slot 1/0 on cr2-codfw T331527
  • 10:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45669 and previous config saved to /var/cache/conftool/dbconfig/20230309-104220-marostegui.json
  • 10:39 otto@deploy2002: Synchronized multiversion/MWConfigCacheGenerator.php: Step 1b: MWConfigCacheGenerator.php - load ext-EventStreamConfig.php - T308932 (duration: 06m 23s)
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45668 and previous config saved to /var/cache/conftool/dbconfig/20230309-103753-marostegui.json
  • 10:32 hashar@deploy2002: Finished deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others (duration: 00m 08s)
  • 10:32 hashar@deploy2002: Started deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others
  • 10:29 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Step 1a: ext-EventStreamConfig.php - wgEventStreams lives here - T308932 (duration: 06m 43s)
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:23 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45667 and previous config saved to /var/cache/conftool/dbconfig/20230309-102247-marostegui.json
  • 10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:22 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:22 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:22 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:21 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:21 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 10:19 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:19 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:12 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45666 and previous config saved to /var/cache/conftool/dbconfig/20230309-101042-marostegui.json
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45665 and previous config saved to /var/cache/conftool/dbconfig/20230309-101020-marostegui.json
  • 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45664 and previous config saved to /var/cache/conftool/dbconfig/20230309-100611-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:01 topranks: commencing work to drain cr2-codfw ports on card 1/0 (T331601)
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 09:55 marostegui: Deploy schema change on s4 codfw dbmaint T329684
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45663 and previous config saved to /var/cache/conftool/dbconfig/20230309-095514-marostegui.json
  • 09:53 marostegui: Deploy schema change on s8 codfw dbmaint T329684
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 09:48 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45662 and previous config saved to /var/cache/conftool/dbconfig/20230309-094602-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45661 and previous config saved to /var/cache/conftool/dbconfig/20230309-094008-marostegui.json
  • 09:33 topranks: resetting Pic 1/0 on cr1-codfw
  • 09:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
  • 09:32 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45660 and previous config saved to /var/cache/conftool/dbconfig/20230309-093120-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45659 and previous config saved to /var/cache/conftool/dbconfig/20230309-093057-root.json
  • 09:29 elukey: delete old/unused ML-related docker images from the registry - T331513
  • 09:27 topranks: disabling Transit cct on cr1-codfw xe-1/0/1:0 (T331527)
  • 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45658 and previous config saved to /var/cache/conftool/dbconfig/20230309-092502-marostegui.json
  • 09:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1011.eqiad.wmnet with OS bullseye
  • 09:21 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
  • 09:20 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
  • 09:19 marostegui: Deploy schema change on s7 codfw dbmaint T329684
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45657 and previous config saved to /var/cache/conftool/dbconfig/20230309-091613-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45656 and previous config saved to /var/cache/conftool/dbconfig/20230309-091552-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45655 and previous config saved to /var/cache/conftool/dbconfig/20230309-091400-marostegui.json
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:13 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45654 and previous config saved to /var/cache/conftool/dbconfig/20230309-091338-marostegui.json
  • 09:13 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:12 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45653 and previous config saved to /var/cache/conftool/dbconfig/20230309-090107-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45652 and previous config saved to /var/cache/conftool/dbconfig/20230309-090048-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45651 and previous config saved to /var/cache/conftool/dbconfig/20230309-085832-marostegui.json
  • 08:54 marostegui: Deploy schema change on s2 codfw dbmaint T329684
  • 08:54 marostegui: Deploy schema change on s5 codfw dbmaint T329684
  • 08:54 marostegui: Deploy schema change on s6 codfw dbmaint T329684
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS bullseye
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45650 and previous config saved to /var/cache/conftool/dbconfig/20230309-084601-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45649 and previous config saved to /var/cache/conftool/dbconfig/20230309-084543-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45648 and previous config saved to /var/cache/conftool/dbconfig/20230309-084359-marostegui.json
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45647 and previous config saved to /var/cache/conftool/dbconfig/20230309-084326-marostegui.json
  • 08:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:39 taavi@deploy2002: Finished scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) (duration: 11m 37s)
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45646 and previous config saved to /var/cache/conftool/dbconfig/20230309-083802-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45645 and previous config saved to /var/cache/conftool/dbconfig/20230309-083604-root.json
  • 08:33 moritzm: remove ganeti1011 for eventual reimage T311687
  • 08:30 taavi@deploy2002: taavi and kharlan: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45644 and previous config saved to /var/cache/conftool/dbconfig/20230309-082820-marostegui.json
  • 08:28 taavi@deploy2002: Started scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264)
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
  • 08:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45643 and previous config saved to /var/cache/conftool/dbconfig/20230309-082257-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45642 and previous config saved to /var/cache/conftool/dbconfig/20230309-082059-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45641 and previous config saved to /var/cache/conftool/dbconfig/20230309-081707-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45640 and previous config saved to /var/cache/conftool/dbconfig/20230309-081646-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45639 and previous config saved to /var/cache/conftool/dbconfig/20230309-080858-marostegui.json
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45638 and previous config saved to /var/cache/conftool/dbconfig/20230309-080837-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45637 and previous config saved to /var/cache/conftool/dbconfig/20230309-080752-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45636 and previous config saved to /var/cache/conftool/dbconfig/20230309-080555-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45635 and previous config saved to /var/cache/conftool/dbconfig/20230309-080140-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45634 and previous config saved to /var/cache/conftool/dbconfig/20230309-075331-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45633 and previous config saved to /var/cache/conftool/dbconfig/20230309-075247-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45632 and previous config saved to /var/cache/conftool/dbconfig/20230309-075050-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45631 and previous config saved to /var/cache/conftool/dbconfig/20230309-074633-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45630 and previous config saved to /var/cache/conftool/dbconfig/20230309-073825-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45629 and previous config saved to /var/cache/conftool/dbconfig/20230309-073743-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45628 and previous config saved to /var/cache/conftool/dbconfig/20230309-073545-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45627 and previous config saved to /var/cache/conftool/dbconfig/20230309-073127-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45626 and previous config saved to /var/cache/conftool/dbconfig/20230309-072319-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45625 and previous config saved to /var/cache/conftool/dbconfig/20230309-072238-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45624 and previous config saved to /var/cache/conftool/dbconfig/20230309-072040-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329684)', diff saved to https://phabricator.wikimedia.org/P45623 and previous config saved to /var/cache/conftool/dbconfig/20230309-071853-marostegui.json
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45622 and previous config saved to /var/cache/conftool/dbconfig/20230309-071809-marostegui.json
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui: Deploy schema change on s3 eqiad dbmaint T329684
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 07:13 marostegui: Deploy schema change on s7 eqiad dbmaint T329684
  • 07:13 marostegui: Deploy schema change on s8 eqiad dbmaint T329684
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P45621 and previous config saved to /var/cache/conftool/dbconfig/20230309-071029-root.json
  • 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45620 and previous config saved to /var/cache/conftool/dbconfig/20230309-070805-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45619 and previous config saved to /var/cache/conftool/dbconfig/20230309-070733-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45618 and previous config saved to /var/cache/conftool/dbconfig/20230309-070658-marostegui.json
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329684)', diff saved to https://phabricator.wikimedia.org/P45617 and previous config saved to /var/cache/conftool/dbconfig/20230309-070327-marostegui.json
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45616 and previous config saved to /var/cache/conftool/dbconfig/20230309-070223-marostegui.json
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui: Deploy schema change on s1 eqiad dbmaint T329684
  • 06:48 marostegui: Deploy schema change on s4 eqiad dbmaint T329684
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45615 and previous config saved to /var/cache/conftool/dbconfig/20230309-064538-marostegui.json
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:43 marostegui: Deploy schema change on s2 eqiad dbmaint T329684
  • 06:42 marostegui: Deploy schema change on s5 eqiad dbmaint T329684
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
  • 06:40 marostegui: Deploy schema change on s6 eqiad dbmaint T329684
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 04:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45614 and previous config saved to /var/cache/conftool/dbconfig/20230309-040925-marostegui.json
  • 03:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45613 and previous config saved to /var/cache/conftool/dbconfig/20230309-035418-marostegui.json
  • 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45612 and previous config saved to /var/cache/conftool/dbconfig/20230309-033912-marostegui.json
  • 03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45611 and previous config saved to /var/cache/conftool/dbconfig/20230309-032406-marostegui.json
  • 03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45610 and previous config saved to /var/cache/conftool/dbconfig/20230309-030445-marostegui.json
  • 03:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 03:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45609 and previous config saved to /var/cache/conftool/dbconfig/20230309-030424-marostegui.json
  • 02:59 sukhe: run keyholder arm on acmechief2001
  • 02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45608 and previous config saved to /var/cache/conftool/dbconfig/20230309-024917-marostegui.json
  • 02:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45607 and previous config saved to /var/cache/conftool/dbconfig/20230309-023411-marostegui.json
  • 02:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45606 and previous config saved to /var/cache/conftool/dbconfig/20230309-021905-marostegui.json
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45604 and previous config saved to /var/cache/conftool/dbconfig/20230309-015831-marostegui.json
  • 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45603 and previous config saved to /var/cache/conftool/dbconfig/20230309-015810-marostegui.json
  • 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45602 and previous config saved to /var/cache/conftool/dbconfig/20230309-014303-marostegui.json
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45601 and previous config saved to /var/cache/conftool/dbconfig/20230309-012757-marostegui.json
  • 01:18 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors (duration: 00m 13s)
  • 01:18 ebernhardson@deploy2002: Started deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors
  • 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45600 and previous config saved to /var/cache/conftool/dbconfig/20230309-011251-marostegui.json
  • 00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45599 and previous config saved to /var/cache/conftool/dbconfig/20230309-005220-marostegui.json
  • 00:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45598 and previous config saved to /var/cache/conftool/dbconfig/20230309-005210-marostegui.json
  • 00:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45597 and previous config saved to /var/cache/conftool/dbconfig/20230309-003703-marostegui.json
  • 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45596 and previous config saved to /var/cache/conftool/dbconfig/20230309-002157-marostegui.json
  • 00:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45594 and previous config saved to /var/cache/conftool/dbconfig/20230309-000651-marostegui.json

2023-03-08

  • 23:50 zabe@deploy2002: Finished scap: T308932 (duration: 07m 15s)
  • 23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45593 and previous config saved to /var/cache/conftool/dbconfig/20230308-234534-marostegui.json
  • 23:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 23:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45592 and previous config saved to /var/cache/conftool/dbconfig/20230308-234502-marostegui.json
  • 23:42 zabe@deploy2002: Started scap: T308932
  • 23:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths (duration: 00m 14s)
  • 23:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths
  • 23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45591 and previous config saved to /var/cache/conftool/dbconfig/20230308-232956-marostegui.json
  • 23:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45590 and previous config saved to /var/cache/conftool/dbconfig/20230308-231449-marostegui.json
  • 22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45589 and previous config saved to /var/cache/conftool/dbconfig/20230308-225943-marostegui.json
  • 22:44 hashar: Upgrading CI Jenkins
  • 22:42 tgr: UTC late deploys done
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45588 and previous config saved to /var/cache/conftool/dbconfig/20230308-224044-marostegui.json
  • 22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45587 and previous config saved to /var/cache/conftool/dbconfig/20230308-224018-marostegui.json
  • 22:39 tgr@deploy2002: Finished scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) (duration: 08m 31s)
  • 22:32 tgr@deploy2002: tgr: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:30 tgr@deploy2002: Started scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524)
  • 22:29 tgr@deploy2002: Finished scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) (duration: 07m 43s)
  • 22:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 22:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45586 and previous config saved to /var/cache/conftool/dbconfig/20230308-222512-marostegui.json
  • 22:23 tgr@deploy2002: tgr: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:21 tgr@deploy2002: Started scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412)
  • 22:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 22:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 22:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45585 and previous config saved to /var/cache/conftool/dbconfig/20230308-221006-marostegui.json
  • 22:09 kindrobot: hand off backport window UTC late to tgr for self-service
  • 22:07 kindrobot@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) (duration: 09m 36s)
  • 21:59 kindrobot@deploy2002: sbailey and kindrobot: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:57 kindrobot@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612)
  • 21:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45584 and previous config saved to /var/cache/conftool/dbconfig/20230308-215500-marostegui.json
  • 21:54 kindrobot@deploy2002: Finished scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) (duration: 07m 49s)
  • 21:48 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.cod
  • 21:46 kindrobot@deploy2002: Started scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588)
  • 21:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 21:30 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqi
  • 21:29 kindrobot@deploy2002: Started scap: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444)
  • 21:22 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix (duration: 00m 05s)
  • 21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix
  • 21:19 kindrobot: start UTC-late backport window
  • 21:08 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided) (duration: 01m 01s)
  • 21:07 hashar@deploy2002: Started deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided)
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45583 and previous config saved to /var/cache/conftool/dbconfig/20230308-205435-marostegui.json
  • 20:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45582 and previous config saved to /var/cache/conftool/dbconfig/20230308-205414-marostegui.json
  • 20:51 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief2001.codfw.wmnet with OS bullseye
  • 20:41 mutante: deploy2002 - systemctl restart keyholder-proxy.service to fix T331568 - after this SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -i /etc/keyholder.d/deploy_jenkins -l deploy-jenkins releases1002.eqiad.wmnet works
  • 20:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 20:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45581 and previous config saved to /var/cache/conftool/dbconfig/20230308-203907-marostegui.json
  • 20:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 20:24 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief2001.codfw.wmnet with OS bullseye
  • 20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45580 and previous config saved to /var/cache/conftool/dbconfig/20230308-202401-marostegui.json
  • 20:18 urandom: power cycle restbase2022 (unresponsive; cannot SSH)
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45579 and previous config saved to /var/cache/conftool/dbconfig/20230308-200855-marostegui.json
  • 20:01 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bullseye
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45578 and previous config saved to /var/cache/conftool/dbconfig/20230308-194646-marostegui.json
  • 19:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45577 and previous config saved to /var/cache/conftool/dbconfig/20230308-194625-marostegui.json
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 19:31 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test1001.eqiad.wmnet with OS bullseye
  • 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45576 and previous config saved to /var/cache/conftool/dbconfig/20230308-193118-marostegui.json
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45575 and previous config saved to /var/cache/conftool/dbconfig/20230308-191612-marostegui.json
  • 19:16 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.26 refs T330204 (duration: 06m 16s)
  • 19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
  • 19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
  • 19:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.26 refs T330204
  • 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief-test2001.codfw.wmnet
  • 19:09 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief-test2001.codfw.wmnet
  • 19:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45574 and previous config saved to /var/cache/conftool/dbconfig/20230308-190106-marostegui.json
  • 18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45573 and previous config saved to /var/cache/conftool/dbconfig/20230308-184328-marostegui.json
  • 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45572 and previous config saved to /var/cache/conftool/dbconfig/20230308-184204-marostegui.json
  • 18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45571 and previous config saved to /var/cache/conftool/dbconfig/20230308-184143-marostegui.json
  • 18:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45570 and previous config saved to /var/cache/conftool/dbconfig/20230308-183020-ladsgroup.json
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45569 and previous config saved to /var/cache/conftool/dbconfig/20230308-182822-marostegui.json
  • 18:28 inflatador: bking@cumin2002 repool elastic1060-1066 to finish off T322082
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45568 and previous config saved to /var/cache/conftool/dbconfig/20230308-182726-marostegui.json
  • 18:27 inflatador: bking@cumin2002 unban elastic1060-1066 to finish off T322082
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45567 and previous config saved to /var/cache/conftool/dbconfig/20230308-182637-marostegui.json
  • 18:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
  • 18:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
  • 18:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test2001.codfw.wmnet with OS bullseye
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45566 and previous config saved to /var/cache/conftool/dbconfig/20230308-181514-ladsgroup.json
  • 18:14 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:13 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
  • 18:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45565 and previous config saved to /var/cache/conftool/dbconfig/20230308-181316-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45564 and previous config saved to /var/cache/conftool/dbconfig/20230308-181220-marostegui.json
  • 18:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45563 and previous config saved to /var/cache/conftool/dbconfig/20230308-181131-marostegui.json
  • 18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
  • 18:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
  • 18:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 18:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45562 and previous config saved to /var/cache/conftool/dbconfig/20230308-180008-ladsgroup.json
  • 17:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 17:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 17:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
  • 17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 17:58 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 17:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45561 and previous config saved to /var/cache/conftool/dbconfig/20230308-175810-marostegui.json
  • 17:58 herron: failing grafana over from codfw to eqiad
  • 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45560 and previous config saved to /var/cache/conftool/dbconfig/20230308-175714-marostegui.json
  • 17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45559 and previous config saved to /var/cache/conftool/dbconfig/20230308-175625-marostegui.json
  • 17:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:48 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:47 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:47 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test2001.codfw.wmnet with OS bullseye
  • 17:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45558 and previous config saved to /var/cache/conftool/dbconfig/20230308-174535-marostegui.json
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:45 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45557 and previous config saved to /var/cache/conftool/dbconfig/20230308-174514-marostegui.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45556 and previous config saved to /var/cache/conftool/dbconfig/20230308-174501-ladsgroup.json
  • 17:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45555 and previous config saved to /var/cache/conftool/dbconfig/20230308-174208-marostegui.json
  • 17:38 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45554 and previous config saved to /var/cache/conftool/dbconfig/20230308-173701-marostegui.json
  • 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45553 and previous config saved to /var/cache/conftool/dbconfig/20230308-173640-marostegui.json
  • 17:34 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:34 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45552 and previous config saved to /var/cache/conftool/dbconfig/20230308-173125-marostegui.json
  • 17:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45551 and previous config saved to /var/cache/conftool/dbconfig/20230308-173104-marostegui.json
  • 17:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45550 and previous config saved to /var/cache/conftool/dbconfig/20230308-173007-marostegui.json
  • 17:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45549 and previous config saved to /var/cache/conftool/dbconfig/20230308-172134-marostegui.json
  • 17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45548 and previous config saved to /var/cache/conftool/dbconfig/20230308-171558-marostegui.json
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45547 and previous config saved to /var/cache/conftool/dbconfig/20230308-171501-marostegui.json
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45546 and previous config saved to /var/cache/conftool/dbconfig/20230308-170627-marostegui.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45545 and previous config saved to /var/cache/conftool/dbconfig/20230308-170512-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45543 and previous config saved to /var/cache/conftool/dbconfig/20230308-170051-marostegui.json
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45542 and previous config saved to /var/cache/conftool/dbconfig/20230308-165955-marostegui.json
  • 16:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45541 and previous config saved to /var/cache/conftool/dbconfig/20230308-165121-marostegui.json
  • 16:49 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45540 and previous config saved to /var/cache/conftool/dbconfig/20230308-164807-marostegui.json
  • 16:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45539 and previous config saved to /var/cache/conftool/dbconfig/20230308-164746-marostegui.json
  • 16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45538 and previous config saved to /var/cache/conftool/dbconfig/20230308-164545-marostegui.json
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:35 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:34 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
  • 16:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45537 and previous config saved to /var/cache/conftool/dbconfig/20230308-163311-marostegui.json
  • 16:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45536 and previous config saved to /var/cache/conftool/dbconfig/20230308-163249-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45535 and previous config saved to /var/cache/conftool/dbconfig/20230308-163240-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45534 and previous config saved to /var/cache/conftool/dbconfig/20230308-163230-marostegui.json
  • 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
  • 16:28 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
  • 16:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
  • 16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 16:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45533 and previous config saved to /var/cache/conftool/dbconfig/20230308-161737-marostegui.json
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45532 and previous config saved to /var/cache/conftool/dbconfig/20230308-161727-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:10 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
  • 16:08 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1062.eqiad.wmnet
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
  • 16:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 4 hosts
  • 16:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 4 hosts
  • 16:03 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45531 and previous config saved to /var/cache/conftool/dbconfig/20230308-160231-marostegui.json
  • 16:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45530 and previous config saved to /var/cache/conftool/dbconfig/20230308-160221-marostegui.json
  • 16:00 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1062.eqiad.wmnet
  • 16:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1061.eqiad.wmnet
  • 15:59 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:54 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1061.eqiad.wmnet
  • 15:52 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:52 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:50 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:49 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45529 and previous config saved to /var/cache/conftool/dbconfig/20230308-154736-marostegui.json
  • 15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45528 and previous config saved to /var/cache/conftool/dbconfig/20230308-154724-marostegui.json
  • 15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45527 and previous config saved to /var/cache/conftool/dbconfig/20230308-154709-marostegui.json
  • 15:46 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:42 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:33 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45526 and previous config saved to /var/cache/conftool/dbconfig/20230308-153202-marostegui.json
  • 15:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 15:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 15:22 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Fix typo in rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 06m 41s)
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45525 and previous config saved to /var/cache/conftool/dbconfig/20230308-151656-marostegui.json
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Declare rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 11m 33s)
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45524 and previous config saved to /var/cache/conftool/dbconfig/20230308-150150-marostegui.json
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45523 and previous config saved to /var/cache/conftool/dbconfig/20230308-145245-marostegui.json
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45522 and previous config saved to /var/cache/conftool/dbconfig/20230308-144934-marostegui.json
  • 14:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45521 and previous config saved to /var/cache/conftool/dbconfig/20230308-144924-marostegui.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45520 and previous config saved to /var/cache/conftool/dbconfig/20230308-144659-marostegui.json
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45519 and previous config saved to /var/cache/conftool/dbconfig/20230308-144634-marostegui.json
  • 14:46 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:45 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:43 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:41 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:41 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:38 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:37 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45518 and previous config saved to /var/cache/conftool/dbconfig/20230308-143739-marostegui.json
  • 14:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45517 and previous config saved to /var/cache/conftool/dbconfig/20230308-143418-marostegui.json
  • 14:34 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:33 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:32 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45516 and previous config saved to /var/cache/conftool/dbconfig/20230308-143127-marostegui.json
  • 14:25 inflatador: bking@cumin2002 powering down elastic1060-66 for re-rack T322082
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45514 and previous config saved to /var/cache/conftool/dbconfig/20230308-142233-marostegui.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45513 and previous config saved to /var/cache/conftool/dbconfig/20230308-141911-marostegui.json
  • 14:16 TheresNoTime: close UTC afternoon backport window
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45511 and previous config saved to /var/cache/conftool/dbconfig/20230308-141621-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45510 and previous config saved to /var/cache/conftool/dbconfig/20230308-140727-marostegui.json
  • 14:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45509 and previous config saved to /var/cache/conftool/dbconfig/20230308-140405-marostegui.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45508 and previous config saved to /var/cache/conftool/dbconfig/20230308-140115-marostegui.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45507 and previous config saved to /var/cache/conftool/dbconfig/20230308-135153-marostegui.json
  • 13:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45506 and previous config saved to /var/cache/conftool/dbconfig/20230308-135132-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45505 and previous config saved to /var/cache/conftool/dbconfig/20230308-134945-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45504 and previous config saved to /var/cache/conftool/dbconfig/20230308-134034-marostegui.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45503 and previous config saved to /var/cache/conftool/dbconfig/20230308-134002-marostegui.json
  • 13:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45502 and previous config saved to /var/cache/conftool/dbconfig/20230308-133940-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45501 and previous config saved to /var/cache/conftool/dbconfig/20230308-133626-marostegui.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45500 and previous config saved to /var/cache/conftool/dbconfig/20230308-132528-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45499 and previous config saved to /var/cache/conftool/dbconfig/20230308-132434-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45498 and previous config saved to /var/cache/conftool/dbconfig/20230308-132120-marostegui.json
  • 13:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host urldownloader1003.wikimedia.org with OS bullseye
  • 13:11 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 13:11 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
  • 13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45497 and previous config saved to /var/cache/conftool/dbconfig/20230308-131022-marostegui.json
  • 13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45496 and previous config saved to /var/cache/conftool/dbconfig/20230308-130928-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45495 and previous config saved to /var/cache/conftool/dbconfig/20230308-130613-marostegui.json
  • 13:02 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:00 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45494 and previous config saved to /var/cache/conftool/dbconfig/20230308-125548-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45493 and previous config saved to /var/cache/conftool/dbconfig/20230308-125527-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45492 and previous config saved to /var/cache/conftool/dbconfig/20230308-125515-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45491 and previous config saved to /var/cache/conftool/dbconfig/20230308-125422-marostegui.json
  • 12:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45490 and previous config saved to /var/cache/conftool/dbconfig/20230308-124945-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45489 and previous config saved to /var/cache/conftool/dbconfig/20230308-124924-marostegui.json
  • 12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45488 and previous config saved to /var/cache/conftool/dbconfig/20230308-124344-marostegui.json
  • 12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45487 and previous config saved to /var/cache/conftool/dbconfig/20230308-124334-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45486 and previous config saved to /var/cache/conftool/dbconfig/20230308-124021-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45485 and previous config saved to /var/cache/conftool/dbconfig/20230308-123418-marostegui.json
  • 12:31 hnowlan: running authdns-update for r/890398
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45484 and previous config saved to /var/cache/conftool/dbconfig/20230308-122827-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45483 and previous config saved to /var/cache/conftool/dbconfig/20230308-122515-marostegui.json
  • 12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45482 and previous config saved to /var/cache/conftool/dbconfig/20230308-121912-marostegui.json
  • 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bullseye
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45480 and previous config saved to /var/cache/conftool/dbconfig/20230308-121321-marostegui.json
  • 12:10 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45479 and previous config saved to /var/cache/conftool/dbconfig/20230308-121009-marostegui.json
  • 12:09 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host urldownloader1003.wikimedia.org with OS bullseye
  • 12:08 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45478 and previous config saved to /var/cache/conftool/dbconfig/20230308-120406-marostegui.json
  • 12:01 claime: restbase-async back in standard state - T330651
  • 12:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 12:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T330651
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45477 and previous config saved to /var/cache/conftool/dbconfig/20230308-115935-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45476 and previous config saved to /var/cache/conftool/dbconfig/20230308-115924-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45475 and previous config saved to /var/cache/conftool/dbconfig/20230308-115913-marostegui.json
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45474 and previous config saved to /var/cache/conftool/dbconfig/20230308-115903-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45473 and previous config saved to /var/cache/conftool/dbconfig/20230308-115815-marostegui.json
  • 11:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 11:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 11:55 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 11:55 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T330651
  • 11:55 claime: restbase-async pooled in eqiad, depooling in codfw- T330651
  • 11:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool restbase-async in eqiad: T330651
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45472 and previous config saved to /var/cache/conftool/dbconfig/20230308-115252-root.json
  • 11:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 11:49 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 11:49 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool restbase-async in eqiad: T330651
  • 11:49 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9] (duration: 01m 30s)
  • 11:48 claime: Starting restbase-async switchback - T330651
  • 11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9]
  • 11:47 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9] (duration: 00m 07s)
  • 11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9]
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45471 and previous config saved to /var/cache/conftool/dbconfig/20230308-114652-marostegui.json
  • 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45470 and previous config saved to /var/cache/conftool/dbconfig/20230308-114642-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45469 and previous config saved to /var/cache/conftool/dbconfig/20230308-114553-root.json
  • 11:44 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bullseye
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45468 and previous config saved to /var/cache/conftool/dbconfig/20230308-114407-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45467 and previous config saved to /var/cache/conftool/dbconfig/20230308-114357-marostegui.json
  • 11:42 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 05m 09s)
  • 11:37 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
  • 11:37 otto@deploy2002: deploy aborted: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 09m 38s)
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45466 and previous config saved to /var/cache/conftool/dbconfig/20230308-113136-marostegui.json
  • 11:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45465 and previous config saved to /var/cache/conftool/dbconfig/20230308-112901-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45464 and previous config saved to /var/cache/conftool/dbconfig/20230308-112850-marostegui.json
  • 11:27 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
  • 11:27 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:27 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:26 akosiaris: T307943 upgrade kubernetes-client on deploy1002 deploy2002
  • 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
  • 11:23 claime: Traffic: authdns updated successfully for eqiad repool - T331285
  • 11:21 claime: Traffic: repool eqiad for user traffic - T331285
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45463 and previous config saved to /var/cache/conftool/dbconfig/20230308-111628-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45462 and previous config saved to /var/cache/conftool/dbconfig/20230308-111355-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45461 and previous config saved to /var/cache/conftool/dbconfig/20230308-111344-marostegui.json
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45460 and previous config saved to /var/cache/conftool/dbconfig/20230308-110907-marostegui.json
  • 11:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45459 and previous config saved to /var/cache/conftool/dbconfig/20230308-110846-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45458 and previous config saved to /var/cache/conftool/dbconfig/20230308-110306-marostegui.json
  • 11:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45457 and previous config saved to /var/cache/conftool/dbconfig/20230308-110121-marostegui.json
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45456 and previous config saved to /var/cache/conftool/dbconfig/20230308-105347-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45455 and previous config saved to /var/cache/conftool/dbconfig/20230308-105339-marostegui.json
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:51 otto@deploy2002: Finished deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334] (duration: 08m 20s)
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45454 and previous config saved to /var/cache/conftool/dbconfig/20230308-105043-marostegui.json
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45453 and previous config saved to /var/cache/conftool/dbconfig/20230308-105022-marostegui.json
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:49 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:42 otto@deploy2002: Started deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334]
  • 10:40 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45452 and previous config saved to /var/cache/conftool/dbconfig/20230308-103840-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45451 and previous config saved to /var/cache/conftool/dbconfig/20230308-103833-marostegui.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45450 and previous config saved to /var/cache/conftool/dbconfig/20230308-103515-marostegui.json
  • 10:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45449 and previous config saved to /var/cache/conftool/dbconfig/20230308-102334-marostegui.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45448 and previous config saved to /var/cache/conftool/dbconfig/20230308-102326-marostegui.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45447 and previous config saved to /var/cache/conftool/dbconfig/20230308-102009-marostegui.json
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45446 and previous config saved to /var/cache/conftool/dbconfig/20230308-101944-marostegui.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45445 and previous config saved to /var/cache/conftool/dbconfig/20230308-100826-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45444 and previous config saved to /var/cache/conftool/dbconfig/20230308-100502-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45443 and previous config saved to /var/cache/conftool/dbconfig/20230308-100437-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45442 and previous config saved to /var/cache/conftool/dbconfig/20230308-095804-marostegui.json
  • 09:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45441 and previous config saved to /var/cache/conftool/dbconfig/20230308-095742-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45440 and previous config saved to /var/cache/conftool/dbconfig/20230308-095320-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45439 and previous config saved to /var/cache/conftool/dbconfig/20230308-095259-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45438 and previous config saved to /var/cache/conftool/dbconfig/20230308-094931-marostegui.json
  • 09:45 claime: Rebuilding production-images for 894687
  • 09:43 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45437 and previous config saved to /var/cache/conftool/dbconfig/20230308-094236-marostegui.json
  • 09:42 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45436 and previous config saved to /var/cache/conftool/dbconfig/20230308-093752-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45435 and previous config saved to /var/cache/conftool/dbconfig/20230308-093424-marostegui.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45434 and previous config saved to /var/cache/conftool/dbconfig/20230308-093106-marostegui.json
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45433 and previous config saved to /var/cache/conftool/dbconfig/20230308-093045-marostegui.json
  • 09:30 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45432 and previous config saved to /var/cache/conftool/dbconfig/20230308-092729-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45431 and previous config saved to /var/cache/conftool/dbconfig/20230308-092246-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45430 and previous config saved to /var/cache/conftool/dbconfig/20230308-091538-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45429 and previous config saved to /var/cache/conftool/dbconfig/20230308-091223-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45428 and previous config saved to /var/cache/conftool/dbconfig/20230308-090739-marostegui.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45426 and previous config saved to /var/cache/conftool/dbconfig/20230308-090156-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45425 and previous config saved to /var/cache/conftool/dbconfig/20230308-090134-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45424 and previous config saved to /var/cache/conftool/dbconfig/20230308-090031-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45423 and previous config saved to /var/cache/conftool/dbconfig/20230308-085608-marostegui.json
  • 08:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45422 and previous config saved to /var/cache/conftool/dbconfig/20230308-085546-marostegui.json
  • 08:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:53 akosiaris: remove 10.64.64.0/21 and 10.192.64.0/21 from calico GlobalNetworkPolicies T326617
  • 08:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45421 and previous config saved to /var/cache/conftool/dbconfig/20230308-085159-root.json
  • 08:50 vgutierrez: re-enable HAProxy systemd service unit hardening in ulsfo - T323944
  • 08:49 moritzm: installing git security updates
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45420 and previous config saved to /var/cache/conftool/dbconfig/20230308-084628-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45419 and previous config saved to /var/cache/conftool/dbconfig/20230308-084525-marostegui.json
  • 08:41 marostegui: Deploy schema change on s3 eqiad dbmaint T329203
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45418 and previous config saved to /var/cache/conftool/dbconfig/20230308-084053-marostegui.json
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45417 and previous config saved to /var/cache/conftool/dbconfig/20230308-084040-marostegui.json
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45416 and previous config saved to /var/cache/conftool/dbconfig/20230308-083843-marostegui.json
  • 08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 15%: Repooling', diff saved to https://phabricator.wikimedia.org/P45415 and previous config saved to /var/cache/conftool/dbconfig/20230308-083731-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45414 and previous config saved to /var/cache/conftool/dbconfig/20230308-083654-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P45413 and previous config saved to /var/cache/conftool/dbconfig/20230308-083618-marostegui.json
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 08:34 marostegui: Deploy schema change on s3 eqiad dbmaint T329260
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
  • 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
  • 08:32 marostegui: Deploy schema change on s5 eqiad dbmaint T329260
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45412 and previous config saved to /var/cache/conftool/dbconfig/20230308-083121-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45411 and previous config saved to /var/cache/conftool/dbconfig/20230308-082533-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45410 and previous config saved to /var/cache/conftool/dbconfig/20230308-082149-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45409 and previous config saved to /var/cache/conftool/dbconfig/20230308-082112-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45408 and previous config saved to /var/cache/conftool/dbconfig/20230308-081809-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45407 and previous config saved to /var/cache/conftool/dbconfig/20230308-081748-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45406 and previous config saved to /var/cache/conftool/dbconfig/20230308-081614-marostegui.json
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 19 hosts with reason: Schema change
  • 08:15 marostegui: Deploy schema change on s8 eqiad dbmaint T329260
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 19 hosts with reason: Schema change
  • 08:15 marostegui: Deploy schema change on s7 eqiad dbmaint T329260
  • 08:15 marostegui: Deploy schema change on s4 eqiad dbmaint T329260
  • 08:15 marostegui: Deploy schema change on s1 eqiad dbmaint T329260
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2093.codfw.wmnet
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45405 and previous config saved to /var/cache/conftool/dbconfig/20230308-081027-marostegui.json
  • 08:09 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45404 and previous config saved to /var/cache/conftool/dbconfig/20230308-080644-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45403 and previous config saved to /var/cache/conftool/dbconfig/20230308-080431-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2093.codfw.wmnet
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45402 and previous config saved to /var/cache/conftool/dbconfig/20230308-080241-marostegui.json
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 20 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 20 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 22 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 22 hosts with reason: Schema change
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45401 and previous config saved to /var/cache/conftool/dbconfig/20230308-075857-marostegui.json
  • 07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45400 and previous config saved to /var/cache/conftool/dbconfig/20230308-075139-root.json
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:47 taavi@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard (duration: 04m 56s)
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45399 and previous config saved to /var/cache/conftool/dbconfig/20230308-074735-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1109', diff saved to https://phabricator.wikimedia.org/P45398 and previous config saved to /var/cache/conftool/dbconfig/20230308-074427-marostegui.json
  • 07:42 taavi@deploy2002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45397 and previous config saved to /var/cache/conftool/dbconfig/20230308-073633-root.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45396 and previous config saved to /var/cache/conftool/dbconfig/20230308-073228-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T330991', diff saved to https://phabricator.wikimedia.org/P45395 and previous config saved to /var/cache/conftool/dbconfig/20230308-073110-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1126 to s8 primary T330991', diff saved to https://phabricator.wikimedia.org/P45394 and previous config saved to /var/cache/conftool/dbconfig/20230308-073005-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45393 and previous config saved to /var/cache/conftool/dbconfig/20230308-072932-marostegui.json
  • 07:29 marostegui: Starting s8 eqiad failover from db1109 to db1126 - T330991
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45392 and previous config saved to /var/cache/conftool/dbconfig/20230308-072128-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 0 T330991', diff saved to https://phabricator.wikimedia.org/P45391 and previous config saved to /var/cache/conftool/dbconfig/20230308-070544-root.json
  • 07:05 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
  • 07:05 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45390 and previous config saved to /var/cache/conftool/dbconfig/20230308-070458-marostegui.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:53 marostegui: Failover m3 from db1101 to db1159 - T331387
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45389 and previous config saved to /var/cache/conftool/dbconfig/20230308-055038-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45388 and previous config saved to /var/cache/conftool/dbconfig/20230308-053531-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45387 and previous config saved to /var/cache/conftool/dbconfig/20230308-052024-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45386 and previous config saved to /var/cache/conftool/dbconfig/20230308-050517-marostegui.json
  • 04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45385 and previous config saved to /var/cache/conftool/dbconfig/20230308-040451-marostegui.json
  • 04:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45384 and previous config saved to /var/cache/conftool/dbconfig/20230308-040430-marostegui.json
  • 03:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45383 and previous config saved to /var/cache/conftool/dbconfig/20230308-034923-marostegui.json
  • 03:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45382 and previous config saved to /var/cache/conftool/dbconfig/20230308-033416-marostegui.json
  • 03:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45381 and previous config saved to /var/cache/conftool/dbconfig/20230308-031910-marostegui.json
  • 03:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45380 and previous config saved to /var/cache/conftool/dbconfig/20230308-031257-marostegui.json
  • 03:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45379 and previous config saved to /var/cache/conftool/dbconfig/20230308-031246-marostegui.json
  • 02:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45378 and previous config saved to /var/cache/conftool/dbconfig/20230308-025739-marostegui.json
  • 02:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45377 and previous config saved to /var/cache/conftool/dbconfig/20230308-024536-marostegui.json
  • 02:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45376 and previous config saved to /var/cache/conftool/dbconfig/20230308-024233-marostegui.json
  • 02:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45375 and previous config saved to /var/cache/conftool/dbconfig/20230308-023029-marostegui.json
  • 02:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45374 and previous config saved to /var/cache/conftool/dbconfig/20230308-022726-marostegui.json
  • 02:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45373 and previous config saved to /var/cache/conftool/dbconfig/20230308-022116-marostegui.json
  • 02:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 02:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45372 and previous config saved to /var/cache/conftool/dbconfig/20230308-022054-marostegui.json
  • 02:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45371 and previous config saved to /var/cache/conftool/dbconfig/20230308-021523-marostegui.json
  • 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45370 and previous config saved to /var/cache/conftool/dbconfig/20230308-020547-marostegui.json
  • 02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45369 and previous config saved to /var/cache/conftool/dbconfig/20230308-020016-marostegui.json
  • 01:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45368 and previous config saved to /var/cache/conftool/dbconfig/20230308-015921-marostegui.json
  • 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45367 and previous config saved to /var/cache/conftool/dbconfig/20230308-015040-marostegui.json
  • 01:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45366 and previous config saved to /var/cache/conftool/dbconfig/20230308-014659-marostegui.json
  • 01:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45365 and previous config saved to /var/cache/conftool/dbconfig/20230308-014637-marostegui.json
  • 01:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45364 and previous config saved to /var/cache/conftool/dbconfig/20230308-014415-marostegui.json
  • 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45363 and previous config saved to /var/cache/conftool/dbconfig/20230308-013534-marostegui.json
  • 01:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45362 and previous config saved to /var/cache/conftool/dbconfig/20230308-013131-marostegui.json
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45361 and previous config saved to /var/cache/conftool/dbconfig/20230308-012918-marostegui.json
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45360 and previous config saved to /var/cache/conftool/dbconfig/20230308-012908-marostegui.json
  • 01:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45359 and previous config saved to /var/cache/conftool/dbconfig/20230308-012901-marostegui.json
  • 01:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45358 and previous config saved to /var/cache/conftool/dbconfig/20230308-011624-marostegui.json
  • 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45357 and previous config saved to /var/cache/conftool/dbconfig/20230308-011401-marostegui.json
  • 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45356 and previous config saved to /var/cache/conftool/dbconfig/20230308-011354-marostegui.json
  • 01:09 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
  • 01:08 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bullseye
  • 01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45355 and previous config saved to /var/cache/conftool/dbconfig/20230308-010321-marostegui.json
  • 01:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 01:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45354 and previous config saved to /var/cache/conftool/dbconfig/20230308-010300-marostegui.json
  • 01:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45353 and previous config saved to /var/cache/conftool/dbconfig/20230308-010117-marostegui.json
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45352 and previous config saved to /var/cache/conftool/dbconfig/20230308-005848-marostegui.json
  • 00:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 00:51 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45351 and previous config saved to /var/cache/conftool/dbconfig/20230308-004753-marostegui.json
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45350 and previous config saved to /var/cache/conftool/dbconfig/20230308-004744-marostegui.json
  • 00:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45349 and previous config saved to /var/cache/conftool/dbconfig/20230308-004722-marostegui.json
  • 00:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45348 and previous config saved to /var/cache/conftool/dbconfig/20230308-004341-marostegui.json
  • 00:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45347 and previous config saved to /var/cache/conftool/dbconfig/20230308-004115-marostegui.json
  • 00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45346 and previous config saved to /var/cache/conftool/dbconfig/20230308-004049-marostegui.json
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45345 and previous config saved to /var/cache/conftool/dbconfig/20230308-003240-marostegui.json
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45344 and previous config saved to /var/cache/conftool/dbconfig/20230308-003216-marostegui.json
  • 00:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
  • 00:29 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir1002.eqiad.wmnet with OS bullseye
  • 00:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45343 and previous config saved to /var/cache/conftool/dbconfig/20230308-002543-marostegui.json
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45342 and previous config saved to /var/cache/conftool/dbconfig/20230308-001734-marostegui.json
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45341 and previous config saved to /var/cache/conftool/dbconfig/20230308-001709-marostegui.json
  • 00:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45340 and previous config saved to /var/cache/conftool/dbconfig/20230308-001036-marostegui.json
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45339 and previous config saved to /var/cache/conftool/dbconfig/20230308-000538-marostegui.json
  • 00:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45338 and previous config saved to /var/cache/conftool/dbconfig/20230308-000516-marostegui.json
  • 00:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45337 and previous config saved to /var/cache/conftool/dbconfig/20230308-000203-marostegui.json

2023-03-07

  • 23:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45336 and previous config saved to /var/cache/conftool/dbconfig/20230307-235529-marostegui.json
  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45335 and previous config saved to /var/cache/conftool/dbconfig/20230307-235010-marostegui.json
  • 23:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45334 and previous config saved to /var/cache/conftool/dbconfig/20230307-234858-marostegui.json
  • 23:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45333 and previous config saved to /var/cache/conftool/dbconfig/20230307-234837-marostegui.json
  • 23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45332 and previous config saved to /var/cache/conftool/dbconfig/20230307-234741-marostegui.json
  • 23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45331 and previous config saved to /var/cache/conftool/dbconfig/20230307-234715-marostegui.json
  • 23:40 ryankemper@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper (duration: 00m 15s)
  • 23:39 ryankemper@deploy2002: Started deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper
  • 23:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45329 and previous config saved to /var/cache/conftool/dbconfig/20230307-233503-marostegui.json
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45328 and previous config saved to /var/cache/conftool/dbconfig/20230307-233330-marostegui.json
  • 23:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
  • 23:32 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
  • 23:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45327 and previous config saved to /var/cache/conftool/dbconfig/20230307-233209-marostegui.json
  • 23:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 23:30 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
  • 23:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45326 and previous config saved to /var/cache/conftool/dbconfig/20230307-231957-marostegui.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45325 and previous config saved to /var/cache/conftool/dbconfig/20230307-231824-marostegui.json
  • 23:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45324 and previous config saved to /var/cache/conftool/dbconfig/20230307-231702-marostegui.json
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45323 and previous config saved to /var/cache/conftool/dbconfig/20230307-230317-marostegui.json
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45322 and previous config saved to /var/cache/conftool/dbconfig/20230307-230156-marostegui.json
  • 22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45321 and previous config saved to /var/cache/conftool/dbconfig/20230307-225951-marostegui.json
  • 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bullseye
  • 22:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45319 and previous config saved to /var/cache/conftool/dbconfig/20230307-225110-marostegui.json
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45318 and previous config saved to /var/cache/conftool/dbconfig/20230307-224803-marostegui.json
  • 22:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45317 and previous config saved to /var/cache/conftool/dbconfig/20230307-224742-marostegui.json
  • 22:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bullseye
  • 22:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 22:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 22:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45316 and previous config saved to /var/cache/conftool/dbconfig/20230307-223603-marostegui.json
  • 22:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45315 and previous config saved to /var/cache/conftool/dbconfig/20230307-223235-marostegui.json
  • 22:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 22:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 22:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2002.codfw.wmnet with OS bullseye
  • 22:26 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 22:25 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
  • 22:23 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bullseye
  • 22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45314 and previous config saved to /var/cache/conftool/dbconfig/20230307-222056-marostegui.json
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45313 and previous config saved to /var/cache/conftool/dbconfig/20230307-221931-marostegui.json
  • 22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45312 and previous config saved to /var/cache/conftool/dbconfig/20230307-221854-marostegui.json
  • 22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45311 and previous config saved to /var/cache/conftool/dbconfig/20230307-221729-marostegui.json
  • 22:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1001.eqiad.wmnet with OS bullseye
  • 22:14 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
  • 22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4002.ulsfo.wmnet
  • 22:13 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4002.ulsfo.wmnet with OS bullseye
  • 22:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 22:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45310 and previous config saved to /var/cache/conftool/dbconfig/20230307-220550-marostegui.json
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45309 and previous config saved to /var/cache/conftool/dbconfig/20230307-220438-marostegui.json
  • 22:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45308 and previous config saved to /var/cache/conftool/dbconfig/20230307-220416-marostegui.json
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45307 and previous config saved to /var/cache/conftool/dbconfig/20230307-220348-marostegui.json
  • 22:03 mforns@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: (no justification provided) (duration: 00m 18s)
  • 22:03 mforns@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: (no justification provided)
  • 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45306 and previous config saved to /var/cache/conftool/dbconfig/20230307-220222-marostegui.json
  • 21:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6002.drmrs.wmnet with OS bullseye
  • 21:58 inflatador: bking@cumin2002 depool elastic row D hosts to prepare for T322082
  • 21:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 7 hosts with reason: re-rack
  • 21:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 7 hosts with reason: re-rack
  • 21:56 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2001.codfw.wmnet with OS bullseye
  • 21:56 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
  • 21:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 21:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3002.esams.wmnet
  • 21:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3002.esams.wmnet with OS bullseye
  • 21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 21:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45305 and previous config saved to /var/cache/conftool/dbconfig/20230307-214910-marostegui.json
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45304 and previous config saved to /var/cache/conftool/dbconfig/20230307-214841-marostegui.json
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45303 and previous config saved to /var/cache/conftool/dbconfig/20230307-214824-marostegui.json
  • 21:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45302 and previous config saved to /var/cache/conftool/dbconfig/20230307-214802-marostegui.json
  • 21:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 21:43 TheresNoTime: close UTC late backport window
  • 21:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 21:41 inflatador: bking@cumin2002 ban elastic row D hosts to prepare for T322082
  • 21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2073.codfw.wmnet with OS bullseye
  • 21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4002.ulsfo.wmnet with OS bullseye
  • 21:38 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4002.ulsfo.wmnet
  • 21:37 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4001.ulsfo.wmnet
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4001.ulsfo.wmnet with OS bullseye
  • 21:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
  • 21:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45301 and previous config saved to /var/cache/conftool/dbconfig/20230307-213403-marostegui.json
  • 21:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45300 and previous config saved to /var/cache/conftool/dbconfig/20230307-213334-marostegui.json
  • 21:33 samtar@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) (duration: 09m 11s)
  • 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45299 and previous config saved to /var/cache/conftool/dbconfig/20230307-213256-marostegui.json
  • 21:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
  • 21:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:27 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6002.drmrs.wmnet with OS bullseye
  • 21:25 samtar@deploy2002: sbailey and samtar: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:23 samtar@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612)
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45298 and previous config saved to /var/cache/conftool/dbconfig/20230307-212138-marostegui.json
  • 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 21:20 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.26 refs T330204
  • 21:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 21:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45297 and previous config saved to /var/cache/conftool/dbconfig/20230307-211857-marostegui.json
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45296 and previous config saved to /var/cache/conftool/dbconfig/20230307-211749-marostegui.json
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45295 and previous config saved to /var/cache/conftool/dbconfig/20230307-211744-marostegui.json
  • 21:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 21:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3002.esams.wmnet with OS bullseye
  • 21:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45294 and previous config saved to /var/cache/conftool/dbconfig/20230307-211723-marostegui.json
  • 21:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 21:17 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3002.esams.wmnet
  • 21:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3001.esams.wmnet
  • 21:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3001.esams.wmnet with OS bullseye
  • 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45293 and previous config saved to /var/cache/conftool/dbconfig/20230307-211159-marostegui.json
  • 21:10 bblack: lvs500[45]: re-enabling/pooling, back to normal flow
  • 21:10 jhuneidi@deploy2002: Pruned MediaWiki: 1.40.0-wmf.24 (duration: 02m 08s)
  • 21:07 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 43m 53s)
  • 21:07 bking@deploy2002: Finished deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk (duration: 00m 41s)
  • 21:07 bking@deploy2002: Started deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk
  • 21:06 bblack: lvs500[45]: disabling puppet and stopping pybal, all eqsin traffic through lvs5006 temporarily...
  • 21:03 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4001.ulsfo.wmnet with OS bullseye
  • 21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.ulsfo.wmnet
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45292 and previous config saved to /var/cache/conftool/dbconfig/20230307-210243-marostegui.json
  • 21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.drmrs.wmnet
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45291 and previous config saved to /var/cache/conftool/dbconfig/20230307-210216-marostegui.json
  • 20:58 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: test deploy new airflow instance (duration: 02m 03s)
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45290 and previous config saved to /var/cache/conftool/dbconfig/20230307-205653-marostegui.json
  • 20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
  • 20:56 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 01s)
  • 20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
  • 20:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
  • 20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45289 and previous config saved to /var/cache/conftool/dbconfig/20230307-204925-marostegui.json
  • 20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45288 and previous config saved to /var/cache/conftool/dbconfig/20230307-204904-marostegui.json
  • 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45287 and previous config saved to /var/cache/conftool/dbconfig/20230307-204710-marostegui.json
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45286 and previous config saved to /var/cache/conftool/dbconfig/20230307-204146-marostegui.json
  • 20:35 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3001.esams.wmnet with OS bullseye
  • 20:35 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3001.drmrs.wmnet
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45284 and previous config saved to /var/cache/conftool/dbconfig/20230307-203357-marostegui.json
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45283 and previous config saved to /var/cache/conftool/dbconfig/20230307-203203-marostegui.json
  • 20:30 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 02s)
  • 20:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:30 ebernhardson@deploy2002: Finished deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance (duration: 00m 05s)
  • 20:29 ebernhardson@deploy2002: Started deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance
  • 20:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2073.codfw.wmnet with OS bullseye
  • 20:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45282 and previous config saved to /var/cache/conftool/dbconfig/20230307-202713-marostegui.json
  • 20:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 20:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45281 and previous config saved to /var/cache/conftool/dbconfig/20230307-202652-marostegui.json
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45280 and previous config saved to /var/cache/conftool/dbconfig/20230307-202640-marostegui.json
  • 20:24 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6002.eqsin.wmnet
  • 20:19 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6002.drmrs.wmnet with OS bullseye
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45279 and previous config saved to /var/cache/conftool/dbconfig/20230307-201851-marostegui.json
  • 20:17 bking@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk (duration: 01m 18s)
  • 20:16 bking@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk
  • 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45277 and previous config saved to /var/cache/conftool/dbconfig/20230307-201414-marostegui.json
  • 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 20:14 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 01m 49s)
  • 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45276 and previous config saved to /var/cache/conftool/dbconfig/20230307-201353-marostegui.json
  • 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45274 and previous config saved to /var/cache/conftool/dbconfig/20230307-201145-marostegui.json
  • 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45273 and previous config saved to /var/cache/conftool/dbconfig/20230307-200344-marostegui.json
  • 20:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45272 and previous config saved to /var/cache/conftool/dbconfig/20230307-195846-marostegui.json
  • 19:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45270 and previous config saved to /var/cache/conftool/dbconfig/20230307-195639-marostegui.json
  • 19:51 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS bullseye
  • 19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45268 and previous config saved to /var/cache/conftool/dbconfig/20230307-194934-marostegui.json
  • 19:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45267 and previous config saved to /var/cache/conftool/dbconfig/20230307-194913-marostegui.json
  • 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45266 and previous config saved to /var/cache/conftool/dbconfig/20230307-194340-marostegui.json
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45265 and previous config saved to /var/cache/conftool/dbconfig/20230307-194132-marostegui.json
  • 19:40 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 00m 07s)
  • 19:40 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6002.drmrs.wmnet with OS bullseye
  • 19:40 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
  • 19:40 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6002.eqsin.wmnet
  • 19:40 ejegg: payments-wiki upgraded from 346e6f61 to 05a5e09a
  • 19:39 jhuneidi@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.40.0-wmf.26" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 255. (duration: 00m 02s)
  • 19:39 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 19:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
  • 19:37 brett@cumin2002: conftool action : set/pooled=yess; selector: name=ncredir6001.eqsin.wmnet
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45264 and previous config saved to /var/cache/conftool/dbconfig/20230307-193639-marostegui.json
  • 19:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45263 and previous config saved to /var/cache/conftool/dbconfig/20230307-193617-marostegui.json
  • 19:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
  • 19:35 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS bullseye
  • 19:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45262 and previous config saved to /var/cache/conftool/dbconfig/20230307-193406-marostegui.json
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45261 and previous config saved to /var/cache/conftool/dbconfig/20230307-192833-marostegui.json
  • 19:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 19:21 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6001.drmrs.wmnet with OS bullseye
  • 19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45260 and previous config saved to /var/cache/conftool/dbconfig/20230307-192111-marostegui.json
  • 19:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45259 and previous config saved to /var/cache/conftool/dbconfig/20230307-191900-marostegui.json
  • 19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45258 and previous config saved to /var/cache/conftool/dbconfig/20230307-191717-marostegui.json
  • 19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45257 and previous config saved to /var/cache/conftool/dbconfig/20230307-191656-marostegui.json
  • 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2072.codfw.wmnet with OS bullseye
  • 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:08 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5002.eqsin.wmnet with OS bullseye
  • 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4002.ulsfo.wmnet with OS bullseye
  • 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45256 and previous config saved to /var/cache/conftool/dbconfig/20230307-190604-marostegui.json
  • 19:04 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45255 and previous config saved to /var/cache/conftool/dbconfig/20230307-190353-marostegui.json
  • 19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45254 and previous config saved to /var/cache/conftool/dbconfig/20230307-190149-marostegui.json
  • 19:01 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS bullseye
  • 18:59 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 12m 38s)
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 18:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45253 and previous config saved to /var/cache/conftool/dbconfig/20230307-185058-marostegui.json
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
  • 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45252 and previous config saved to /var/cache/conftool/dbconfig/20230307-184907-marostegui.json
  • 18:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:47 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
  • 18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45251 and previous config saved to /var/cache/conftool/dbconfig/20230307-184642-marostegui.json
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45250 and previous config saved to /var/cache/conftool/dbconfig/20230307-184506-marostegui.json
  • 18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45249 and previous config saved to /var/cache/conftool/dbconfig/20230307-184428-marostegui.json
  • 18:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6001.drmrs.wmnet with OS bullseye
  • 18:39 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6001.eqsin.wmnet
  • 18:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
  • 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45248 and previous config saved to /var/cache/conftool/dbconfig/20230307-183810-marostegui.json
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:35 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir5002.eqsin.wmnet
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45247 and previous config saved to /var/cache/conftool/dbconfig/20230307-183136-marostegui.json
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45246 and previous config saved to /var/cache/conftool/dbconfig/20230307-182921-marostegui.json
  • 18:29 dancy: dancy@deploy2002: Fixing up /srv/mediawiki-staging/.git permissions
  • 18:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2072.codfw.wmnet with OS bullseye
  • 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2071.codfw.wmnet with OS bullseye
  • 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45245 and previous config saved to /var/cache/conftool/dbconfig/20230307-182304-marostegui.json
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45244 and previous config saved to /var/cache/conftool/dbconfig/20230307-182035-marostegui.json
  • 18:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:20 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 18:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45243 and previous config saved to /var/cache/conftool/dbconfig/20230307-182013-marostegui.json
  • 18:19 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5001.eqsin.wmnet with OS bullseye
  • 18:18 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:17 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3002.esams.wmnet with OS bullseye
  • 18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5002.eqsin.wmnet with OS bullseye
  • 18:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45242 and previous config saved to /var/cache/conftool/dbconfig/20230307-181414-marostegui.json
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45241 and previous config saved to /var/cache/conftool/dbconfig/20230307-180757-marostegui.json
  • 18:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45240 and previous config saved to /var/cache/conftool/dbconfig/20230307-180506-marostegui.json
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
  • 17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45239 and previous config saved to /var/cache/conftool/dbconfig/20230307-175907-marostegui.json
  • 17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45238 and previous config saved to /var/cache/conftool/dbconfig/20230307-175314-marostegui.json
  • 17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45237 and previous config saved to /var/cache/conftool/dbconfig/20230307-175251-marostegui.json
  • 17:51 inflatador: bking@cumin2002 repool wdqs hosts post-maintenance T329073
  • 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45236 and previous config saved to /var/cache/conftool/dbconfig/20230307-175000-marostegui.json
  • 17:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 17:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45235 and previous config saved to /var/cache/conftool/dbconfig/20230307-174848-marostegui.json
  • 17:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:47 volans@cumin1001: START - Cookbook sre.network.cf
  • 17:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 17:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3002.esams.wmnet with OS bullseye
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45234 and previous config saved to /var/cache/conftool/dbconfig/20230307-173923-marostegui.json
  • 17:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45233 and previous config saved to /var/cache/conftool/dbconfig/20230307-173901-marostegui.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45232 and previous config saved to /var/cache/conftool/dbconfig/20230307-173453-marostegui.json
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45231 and previous config saved to /var/cache/conftool/dbconfig/20230307-173341-marostegui.json
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3001.esams.wmnet with OS bullseye
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45229 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45230 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
  • 17:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bullseye
  • 17:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45228 and previous config saved to /var/cache/conftool/dbconfig/20230307-172333-marostegui.json
  • 17:22 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5002.eqsin.wmnet with OS bullseye
  • 17:21 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir5002.eqsin.wmnet
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45227 and previous config saved to /var/cache/conftool/dbconfig/20230307-171834-marostegui.json
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
  • 17:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
  • 17:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45226 and previous config saved to /var/cache/conftool/dbconfig/20230307-170848-marostegui.json
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45225 and previous config saved to /var/cache/conftool/dbconfig/20230307-170826-marostegui.json
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45224 and previous config saved to /var/cache/conftool/dbconfig/20230307-170328-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45223 and previous config saved to /var/cache/conftool/dbconfig/20230307-170215-marostegui.json
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45222 and previous config saved to /var/cache/conftool/dbconfig/20230307-170154-marostegui.json
  • 16:58 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 16:57 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3001.esams.wmnet with OS bullseye
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45221 and previous config saved to /var/cache/conftool/dbconfig/20230307-165340-marostegui.json
  • 16:53 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2002.codfw.wmnet with OS bullseye
  • 16:53 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@9924c93]: (no justification provided) (duration: 00m 11s)
  • 16:53 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@9924c93]: (no justification provided)
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45220 and previous config saved to /var/cache/conftool/dbconfig/20230307-165319-marostegui.json
  • 16:52 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS bullseye
  • 16:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
  • 16:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45219 and previous config saved to /var/cache/conftool/dbconfig/20230307-164647-marostegui.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45218 and previous config saved to /var/cache/conftool/dbconfig/20230307-164010-marostegui.json
  • 16:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45217 and previous config saved to /var/cache/conftool/dbconfig/20230307-163948-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45216 and previous config saved to /var/cache/conftool/dbconfig/20230307-163813-marostegui.json
  • 16:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45215 and previous config saved to /var/cache/conftool/dbconfig/20230307-163140-marostegui.json
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45214 and previous config saved to /var/cache/conftool/dbconfig/20230307-162616-marostegui.json
  • 16:26 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45213 and previous config saved to /var/cache/conftool/dbconfig/20230307-162554-marostegui.json
  • 16:25 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2001.codfw.wmnet with OS bullseye
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2071.codfw.wmnet with OS bullseye
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45212 and previous config saved to /var/cache/conftool/dbconfig/20230307-162442-marostegui.json
  • 16:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes2016.codfw.wmnet
  • 16:21 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bullseye
  • 16:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1037']
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45211 and previous config saved to /var/cache/conftool/dbconfig/20230307-161634-marostegui.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45210 and previous config saved to /var/cache/conftool/dbconfig/20230307-161132-marostegui.json
  • 16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45209 and previous config saved to /var/cache/conftool/dbconfig/20230307-161111-marostegui.json
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45208 and previous config saved to /var/cache/conftool/dbconfig/20230307-161047-marostegui.json
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45207 and previous config saved to /var/cache/conftool/dbconfig/20230307-160935-marostegui.json
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
  • 16:04 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bullseye
  • 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45206 and previous config saved to /var/cache/conftool/dbconfig/20230307-155604-marostegui.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45205 and previous config saved to /var/cache/conftool/dbconfig/20230307-155541-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45204 and previous config saved to /var/cache/conftool/dbconfig/20230307-155428-marostegui.json
  • 15:53 marostegui: Failover m1-master T330165
  • 15:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 15:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 15:46 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 15:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 15:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
  • 15:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45203 and previous config saved to /var/cache/conftool/dbconfig/20230307-154058-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45202 and previous config saved to /var/cache/conftool/dbconfig/20230307-154049-marostegui.json
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45201 and previous config saved to /var/cache/conftool/dbconfig/20230307-154034-marostegui.json
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:36 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1002.eqiad.wmnet with OS bullseye
  • 15:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bullseye
  • 15:29 moritzm: installing libde265 security updates
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:28 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45200 and previous config saved to /var/cache/conftool/dbconfig/20230307-152729-marostegui.json
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 15:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 15:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5001.eqsin.wmnet with OS bullseye
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45199 and previous config saved to /var/cache/conftool/dbconfig/20230307-152545-marostegui.json
  • 15:25 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
  • 15:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: sync
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: sync
  • 15:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1039']
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:20 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: sync
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45198 and previous config saved to /var/cache/conftool/dbconfig/20230307-152037-marostegui.json
  • 15:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1039']
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:19 Emperor: pool thanos-fe1001 T329073
  • 15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 15:19 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 15:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 15:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 15:16 Emperor: pool ms-fe1009 T329073
  • 15:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:16 Emperor: pool moss-fe1001 T329073
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: sync
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: sync
  • 15:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: sync
  • 15:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1001.eqiad.wmnet with OS bullseye
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync
  • 15:04 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 15:04 bblack: dns1001 - restarted prometheus-bird-exporter
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/SERVICE_NAME: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/SERVICE_NAME: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 15:01 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:01 sukhe: repooling dns1001: authdns-update can now be run again
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase101[69].eqiad.wmnet
  • 14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[18].eqiad.wmnet
  • 14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1031.eqiad.wmnet
  • 14:58 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: sync
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: sync
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 14:56 inflatador: bking@cumin2002 unban production row A elastic nodes from all clusters T329073
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync
  • 14:55 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: sync
  • 14:54 akosiaris: T331126 toolhub deployed, https://toolhub.wikimedia.org/ operational again
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 14:52 inflatador: bking@cumin2002 unban row A cloudelastic nodes T329073
  • 14:47 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T331126
  • 14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet
  • 14:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 238 hosts
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 238 hosts
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mr1-eqiad
  • 14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for mr1-eqiad
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:41 moritzm: enabling Puppet in eqiad/esams/drmrs after completed Switch maintenance T329073
  • 14:40 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:40 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:36 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:20 topranks: issuing reboot to upgrade asw2-a-eqiad virtual-chassis to Junos 21.4
  • 14:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
  • 14:17 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 14:16 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
  • 14:16 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
  • 14:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1037']
  • 14:13 akosiaris: kubectl cordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T329073
  • 14:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2070.codfw.wmnet with OS bullseye
  • 14:12 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
  • 14:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1038']
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 14:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 14:07 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
  • 14:07 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1031.eqiad.wmnet
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[18].eqiad.wmnet
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase101[69].eqiad.wmnet
  • 14:02 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 13:59 jbond: failover pki.discovery.wmnet to codfw T329073
  • 13:58 Emperor: depool thanos-fe1001 T329073
  • 13:55 Emperor: depool ms-fe1009 T329073
  • 13:55 Emperor: depool moss-fe1001 T329073
  • 13:54 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 13:50 moritzm: disabling Puppet in eqiad/esams/drmrs for forthcoming Switch maintenance T329073
  • 13:50 topranks: staging Junos files to individual VC members eqiad row A to prep for upgrade
  • 13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 13:04 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
  • 13:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 12:57 sukhe: removing dns1001 from authdns_servers for T329073
  • 12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 12:52 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 12:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 12:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 12:38 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 12:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 12:27 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 12:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 12:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 12:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:17 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:15 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 12:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:13 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:13 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:12 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:11 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
  • 12:09 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:09 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:08 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:07 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:06 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 12:06 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:06 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:03 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 12:03 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 12:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:00 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
  • 11:56 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
  • 11:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 11:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 11:45 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:44 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:43 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 11:37 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 11:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 11:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 11:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 11:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1005.eqiad.wmnet with OS bullseye
  • 11:28 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 11:23 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1006.eqiad.wmnet with OS bullseye
  • 11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 11:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 11:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45193 and previous config saved to /var/cache/conftool/dbconfig/20230307-111421-marostegui.json
  • 11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 11:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
  • 11:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
  • 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
  • 11:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45192 and previous config saved to /var/cache/conftool/dbconfig/20230307-105914-marostegui.json
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 10:58 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 10:57 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 10:56 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 10:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1006.eqiad.wmnet with OS bullseye
  • 10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1005.eqiad.wmnet with OS bullseye
  • 10:53 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 10:51 akosiaris: manually label kubemaster1001, kubemaster1002 giving them role master T307943
  • 10:48 arturo: apt2001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45191 and previous config saved to /var/cache/conftool/dbconfig/20230307-104408-marostegui.json
  • 10:39 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1001.eqiad.wmnet with OS bullseye
  • 10:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1002.eqiad.wmnet with OS bullseye
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45190 and previous config saved to /var/cache/conftool/dbconfig/20230307-102901-marostegui.json
  • 10:28 arturo: apt1001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
  • 10:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
  • 10:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45189 and previous config saved to /var/cache/conftool/dbconfig/20230307-100807-marostegui.json
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45188 and previous config saved to /var/cache/conftool/dbconfig/20230307-100745-marostegui.json
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1002.eqiad.wmnet with OS bullseye
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1001.eqiad.wmnet with OS bullseye
  • 10:05 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1005.eqiad.wmnet with OS bullseye
  • 09:54 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1006.eqiad.wmnet with OS bullseye
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45187 and previous config saved to /var/cache/conftool/dbconfig/20230307-095239-marostegui.json
  • 09:39 akosiaris: schedule downtime for PyBal backends health on lvs1019, lvs1020
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45186 and previous config saved to /var/cache/conftool/dbconfig/20230307-093732-marostegui.json
  • 09:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1004.eqiad.wmnet with OS bullseye
  • 09:33 moritzm: installing apr-util security updates on Bullseye
  • 09:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45184 and previous config saved to /var/cache/conftool/dbconfig/20230307-092226-marostegui.json
  • 09:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
  • 09:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
  • 09:14 moritzm: installing PHP 7.4 security updates (as packaged in Debian Bullseye, not our internal build for Buster)
  • 09:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1006.eqiad.wmnet with OS bullseye
  • 09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1005.eqiad.wmnet with OS bullseye
  • 09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1004.eqiad.wmnet with OS bullseye
  • 09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45182 and previous config saved to /var/cache/conftool/dbconfig/20230307-090130-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45181 and previous config saved to /var/cache/conftool/dbconfig/20230307-090109-marostegui.json
  • 08:51 akosiaris: T331126 Scheduled 24H downtime for all wikikube eqiad hosts and all LVS services powered by the cluster
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45180 and previous config saved to /var/cache/conftool/dbconfig/20230307-084602-marostegui.json
  • 08:43 dcausse: closing the UTC morning backport window
  • 08:42 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1003.eqiad.wmnet with OS bullseye
  • 08:41 dcausse@deploy2002: Finished scap: Backport for Properly pass the page id on page moves (T331127) (duration: 16m 34s)
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1101 from dbctl T329352', diff saved to https://phabricator.wikimedia.org/P45179 and previous config saved to /var/cache/conftool/dbconfig/20230307-083542-marostegui.json
  • 08:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
  • 08:33 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45178 and previous config saved to /var/cache/conftool/dbconfig/20230307-083056-marostegui.json
  • 08:28 dcausse@deploy2002: dcausse: Backport for Properly pass the page id on page moves (T331127) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:24 dcausse@deploy2002: Started scap: Backport for Properly pass the page id on page moves (T331127)
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:22 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 08:20 marostegui: Failover m3 from db1159 to db1101 - T331384
  • 08:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:19 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
  • 08:18 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45177 and previous config saved to /var/cache/conftool/dbconfig/20230307-081549-marostegui.json
  • 08:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:14 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:14 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:09 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 08:07 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1003.eqiad.wmnet with OS bullseye
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45176 and previous config saved to /var/cache/conftool/dbconfig/20230307-075453-marostegui.json
  • 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45175 and previous config saved to /var/cache/conftool/dbconfig/20230307-075443-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45174 and previous config saved to /var/cache/conftool/dbconfig/20230307-073936-marostegui.json
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
  • 07:34 vgutierrez: enable haproxy systemd service unit hardening in cp4044 - T323944
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) T331381', diff saved to https://phabricator.wikimedia.org/P45172 and previous config saved to /var/cache/conftool/dbconfig/20230307-072454-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45171 and previous config saved to /var/cache/conftool/dbconfig/20230307-072429-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45170 and previous config saved to /var/cache/conftool/dbconfig/20230307-070923-marostegui.json
  • 06:54 marostegui: dbmaint eqiad s1 T329203
  • 06:53 marostegui: dbmaint eqiad s4 T329203
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45169 and previous config saved to /var/cache/conftool/dbconfig/20230307-064752-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45168 and previous config saved to /var/cache/conftool/dbconfig/20230307-064730-marostegui.json
  • 06:43 marostegui: dbmaint eqiad s4 T328817
  • 06:43 marostegui: dbmaint eqiad s1 T328817
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2095.codfw.wmnet
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:34 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45167 and previous config saved to /var/cache/conftool/dbconfig/20230307-063223-marostegui.json
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2095.codfw.wmnet
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45166 and previous config saved to /var/cache/conftool/dbconfig/20230307-061717-marostegui.json
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45165 and previous config saved to /var/cache/conftool/dbconfig/20230307-060210-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45164 and previous config saved to /var/cache/conftool/dbconfig/20230307-054153-marostegui.json
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45163 and previous config saved to /var/cache/conftool/dbconfig/20230307-054127-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45162 and previous config saved to /var/cache/conftool/dbconfig/20230307-052620-marostegui.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45161 and previous config saved to /var/cache/conftool/dbconfig/20230307-051113-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45160 and previous config saved to /var/cache/conftool/dbconfig/20230307-045607-marostegui.json
  • 03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45159 and previous config saved to /var/cache/conftool/dbconfig/20230307-035541-marostegui.json
  • 03:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45158 and previous config saved to /var/cache/conftool/dbconfig/20230307-035520-marostegui.json
  • 03:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45157 and previous config saved to /var/cache/conftool/dbconfig/20230307-034013-marostegui.json
  • 03:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45156 and previous config saved to /var/cache/conftool/dbconfig/20230307-032506-marostegui.json
  • 03:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45155 and previous config saved to /var/cache/conftool/dbconfig/20230307-031000-marostegui.json
  • 02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45154 and previous config saved to /var/cache/conftool/dbconfig/20230307-024912-marostegui.json
  • 02:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 02:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 02:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45153 and previous config saved to /var/cache/conftool/dbconfig/20230307-024850-marostegui.json
  • 02:34 eileen: civicrm upgraded from fe2c06f6 to dbe3b716
  • 02:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45152 and previous config saved to /var/cache/conftool/dbconfig/20230307-023344-marostegui.json
  • 02:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45151 and previous config saved to /var/cache/conftool/dbconfig/20230307-021837-marostegui.json
  • 02:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45150 and previous config saved to /var/cache/conftool/dbconfig/20230307-020330-marostegui.json
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45149 and previous config saved to /var/cache/conftool/dbconfig/20230307-014152-marostegui.json
  • 01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45148 and previous config saved to /var/cache/conftool/dbconfig/20230307-014130-marostegui.json
  • 01:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45147 and previous config saved to /var/cache/conftool/dbconfig/20230307-012624-marostegui.json
  • 01:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45146 and previous config saved to /var/cache/conftool/dbconfig/20230307-011117-marostegui.json
  • 00:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45145 and previous config saved to /var/cache/conftool/dbconfig/20230307-005611-marostegui.json
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45144 and previous config saved to /var/cache/conftool/dbconfig/20230307-003547-marostegui.json
  • 00:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45143 and previous config saved to /var/cache/conftool/dbconfig/20230307-003525-marostegui.json
  • 00:23 mutante: people* - determined which users did not have a public_html dir in codfw but did in eqiad. created that dir, rsynced via push from people1003 to people2002 for the 7 affected users. re-enabled temp disabled puppet to restore live-hacked rsync config. T330091
  • 00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45142 and previous config saved to /var/cache/conftool/dbconfig/20230307-002019-marostegui.json
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45141 and previous config saved to /var/cache/conftool/dbconfig/20230307-000512-marostegui.json

2023-03-06

  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45140 and previous config saved to /var/cache/conftool/dbconfig/20230306-235006-marostegui.json
  • 23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45139 and previous config saved to /var/cache/conftool/dbconfig/20230306-232933-marostegui.json
  • 23:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
  • 23:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
  • 23:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
  • 23:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
  • 23:16 inflatador: bking@cumin2002 ban row A cloudelastic hosts T329073
  • 23:11 mforns@deploy2002: Finished deploy [airflow-dags/analytics@53a0280]: (no justification provided) (duration: 00m 17s)
  • 23:11 mforns@deploy2002: Started deploy [airflow-dags/analytics@53a0280]: (no justification provided)
  • 23:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:05 ryankemper: T329073 Pre-emptively depooled internal wdqs hosts `wdqs10[03,11]`
  • 23:04 inflatador: bking@cumin2002 'depool wcqs and wdqs row A hosts T329073'
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45138 and previous config saved to /var/cache/conftool/dbconfig/20230306-223044-marostegui.json
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45137 and previous config saved to /var/cache/conftool/dbconfig/20230306-221537-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45136 and previous config saved to /var/cache/conftool/dbconfig/20230306-220031-marostegui.json
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45135 and previous config saved to /var/cache/conftool/dbconfig/20230306-214524-marostegui.json
  • 21:45 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45133 and previous config saved to /var/cache/conftool/dbconfig/20230306-212358-marostegui.json
  • 21:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 21:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 21:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45132 and previous config saved to /var/cache/conftool/dbconfig/20230306-212336-marostegui.json
  • 21:19 zabe@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) (duration: 16m 59s)
  • 21:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45131 and previous config saved to /var/cache/conftool/dbconfig/20230306-210829-marostegui.json
  • 21:04 zabe@deploy2002: zabe and sbailey: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:02 zabe@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612)
  • 20:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45130 and previous config saved to /var/cache/conftool/dbconfig/20230306-205322-marostegui.json
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45129 and previous config saved to /var/cache/conftool/dbconfig/20230306-203816-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45128 and previous config saved to /var/cache/conftool/dbconfig/20230306-201704-marostegui.json
  • 20:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45127 and previous config saved to /var/cache/conftool/dbconfig/20230306-201643-marostegui.json
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45126 and previous config saved to /var/cache/conftool/dbconfig/20230306-200843-marostegui.json
  • 20:04 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45125 and previous config saved to /var/cache/conftool/dbconfig/20230306-200354-marostegui.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45124 and previous config saved to /var/cache/conftool/dbconfig/20230306-200136-marostegui.json
  • 19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45123 and previous config saved to /var/cache/conftool/dbconfig/20230306-195336-marostegui.json
  • 19:51 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 19:49 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 19:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45122 and previous config saved to /var/cache/conftool/dbconfig/20230306-194848-marostegui.json
  • 19:48 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 19:47 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45121 and previous config saved to /var/cache/conftool/dbconfig/20230306-194630-marostegui.json
  • 19:45 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 19:44 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45120 and previous config saved to /var/cache/conftool/dbconfig/20230306-193829-marostegui.json
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45119 and previous config saved to /var/cache/conftool/dbconfig/20230306-193341-marostegui.json
  • 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45118 and previous config saved to /var/cache/conftool/dbconfig/20230306-193123-marostegui.json
  • 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45117 and previous config saved to /var/cache/conftool/dbconfig/20230306-192322-marostegui.json
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45116 and previous config saved to /var/cache/conftool/dbconfig/20230306-191835-marostegui.json
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45115 and previous config saved to /var/cache/conftool/dbconfig/20230306-191622-marostegui.json
  • 19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45114 and previous config saved to /var/cache/conftool/dbconfig/20230306-191600-marostegui.json
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45113 and previous config saved to /var/cache/conftool/dbconfig/20230306-190943-marostegui.json
  • 19:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45112 and previous config saved to /var/cache/conftool/dbconfig/20230306-190921-marostegui.json
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45111 and previous config saved to /var/cache/conftool/dbconfig/20230306-190054-marostegui.json
  • 18:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1036']
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45110 and previous config saved to /var/cache/conftool/dbconfig/20230306-185559-marostegui.json
  • 18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45109 and previous config saved to /var/cache/conftool/dbconfig/20230306-185537-marostegui.json
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45108 and previous config saved to /var/cache/conftool/dbconfig/20230306-185415-marostegui.json
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45107 and previous config saved to /var/cache/conftool/dbconfig/20230306-184547-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45106 and previous config saved to /var/cache/conftool/dbconfig/20230306-184030-marostegui.json
  • 18:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45105 and previous config saved to /var/cache/conftool/dbconfig/20230306-183908-marostegui.json
  • 18:38 mutante: phabricator - locked and archived project acl*discovery-repository-admins (T324171)
  • 18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45104 and previous config saved to /var/cache/conftool/dbconfig/20230306-183040-marostegui.json
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1036']
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45103 and previous config saved to /var/cache/conftool/dbconfig/20230306-182524-marostegui.json
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45102 and previous config saved to /var/cache/conftool/dbconfig/20230306-182508-marostegui.json
  • 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45101 and previous config saved to /var/cache/conftool/dbconfig/20230306-182447-marostegui.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45100 and previous config saved to /var/cache/conftool/dbconfig/20230306-182402-marostegui.json
  • 18:23 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45099 and previous config saved to /var/cache/conftool/dbconfig/20230306-181017-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45098 and previous config saved to /var/cache/conftool/dbconfig/20230306-180940-marostegui.json
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45097 and previous config saved to /var/cache/conftool/dbconfig/20230306-180249-marostegui.json
  • 18:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45096 and previous config saved to /var/cache/conftool/dbconfig/20230306-180228-marostegui.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45095 and previous config saved to /var/cache/conftool/dbconfig/20230306-175433-marostegui.json
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45094 and previous config saved to /var/cache/conftool/dbconfig/20230306-175254-marostegui.json
  • 17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45093 and previous config saved to /var/cache/conftool/dbconfig/20230306-175218-marostegui.json
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45092 and previous config saved to /var/cache/conftool/dbconfig/20230306-174721-marostegui.json
  • 17:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45091 and previous config saved to /var/cache/conftool/dbconfig/20230306-173927-marostegui.json
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45090 and previous config saved to /var/cache/conftool/dbconfig/20230306-173711-marostegui.json
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45089 and previous config saved to /var/cache/conftool/dbconfig/20230306-173350-marostegui.json
  • 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45088 and previous config saved to /var/cache/conftool/dbconfig/20230306-173328-marostegui.json
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45087 and previous config saved to /var/cache/conftool/dbconfig/20230306-173215-marostegui.json
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45086 and previous config saved to /var/cache/conftool/dbconfig/20230306-172205-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45085 and previous config saved to /var/cache/conftool/dbconfig/20230306-171821-marostegui.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45084 and previous config saved to /var/cache/conftool/dbconfig/20230306-171708-marostegui.json
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45083 and previous config saved to /var/cache/conftool/dbconfig/20230306-170657-marostegui.json
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45082 and previous config saved to /var/cache/conftool/dbconfig/20230306-170315-marostegui.json
  • 16:54 andrew@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759 (duration: 05m 19s)
  • 16:49 andrew@deploy2002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45081 and previous config saved to /var/cache/conftool/dbconfig/20230306-164808-marostegui.json
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45080 and previous config saved to /var/cache/conftool/dbconfig/20230306-164245-marostegui.json
  • 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-codfw
  • 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45079 and previous config saved to /var/cache/conftool/dbconfig/20230306-164158-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45078 and previous config saved to /var/cache/conftool/dbconfig/20230306-163806-marostegui.json
  • 16:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:32 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-codfw
  • 16:29 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45077 and previous config saved to /var/cache/conftool/dbconfig/20230306-162651-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45076 and previous config saved to /var/cache/conftool/dbconfig/20230306-161652-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45075 and previous config saved to /var/cache/conftool/dbconfig/20230306-161631-marostegui.json
  • 16:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45074 and previous config saved to /var/cache/conftool/dbconfig/20230306-161321-marostegui.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45073 and previous config saved to /var/cache/conftool/dbconfig/20230306-161144-marostegui.json
  • 16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2014.codfw.wmnet
  • 16:05 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2014.codfw.wmnet
  • 16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2014.codfw.wmnet
  • 16:04 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2014.codfw.wmnet
  • 16:03 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2013.codfw.wmnet
  • 16:02 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45072 and previous config saved to /var/cache/conftool/dbconfig/20230306-160124-marostegui.json
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45071 and previous config saved to /var/cache/conftool/dbconfig/20230306-155815-marostegui.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45070 and previous config saved to /var/cache/conftool/dbconfig/20230306-155638-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45069 and previous config saved to /var/cache/conftool/dbconfig/20230306-155428-marostegui.json
  • 15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45068 and previous config saved to /var/cache/conftool/dbconfig/20230306-155030-marostegui.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45067 and previous config saved to /var/cache/conftool/dbconfig/20230306-154618-marostegui.json
  • 15:45 otto@deploy2002: Finished deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided) (duration: 01m 25s)
  • 15:44 otto@deploy2002: Started deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided)
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45066 and previous config saved to /var/cache/conftool/dbconfig/20230306-154308-marostegui.json
  • 15:40 otto@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided) (duration: 01m 27s)
  • 15:39 otto@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided)
  • 15:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2014.codfw.wmnet
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45065 and previous config saved to /var/cache/conftool/dbconfig/20230306-153524-marostegui.json
  • 15:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45064 and previous config saved to /var/cache/conftool/dbconfig/20230306-153111-marostegui.json
  • 15:30 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
  • 15:30 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2014.codfw.wmnet
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45063 and previous config saved to /var/cache/conftool/dbconfig/20230306-152801-marostegui.json
  • 15:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
  • 15:26 mforns@deploy2002: Finished deploy [airflow-dags/analytics@2fa7484]: (no justification provided) (duration: 00m 17s)
  • 15:25 mforns@deploy2002: Started deploy [airflow-dags/analytics@2fa7484]: (no justification provided)
  • 15:25 volans@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:23 zabe@deploy2002: Finished scap: Backport for Add logo for azwikimedia and vewikimedia (T331177) (duration: 08m 33s)
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45062 and previous config saved to /var/cache/conftool/dbconfig/20230306-152017-marostegui.json
  • 15:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
  • 15:16 zabe@deploy2002: zabe: Backport for Add logo for azwikimedia and vewikimedia (T331177) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:14 zabe@deploy2002: Started scap: Backport for Add logo for azwikimedia and vewikimedia (T331177)
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45061 and previous config saved to /var/cache/conftool/dbconfig/20230306-150956-marostegui.json
  • 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:08 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 15:05 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45060 and previous config saved to /var/cache/conftool/dbconfig/20230306-150510-marostegui.json
  • 15:04 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:02 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45059 and previous config saved to /var/cache/conftool/dbconfig/20230306-150115-marostegui.json
  • 15:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45058 and previous config saved to /var/cache/conftool/dbconfig/20230306-150054-marostegui.json
  • 14:59 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45057 and previous config saved to /var/cache/conftool/dbconfig/20230306-145945-marostegui.json
  • 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45056 and previous config saved to /var/cache/conftool/dbconfig/20230306-145924-marostegui.json
  • 14:57 herron: failing grafana over to codfw T329073
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45055 and previous config saved to /var/cache/conftool/dbconfig/20230306-145052-marostegui.json
  • 14:50 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:49 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45054 and previous config saved to /var/cache/conftool/dbconfig/20230306-144547-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45053 and previous config saved to /var/cache/conftool/dbconfig/20230306-144417-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45051 and previous config saved to /var/cache/conftool/dbconfig/20230306-143546-marostegui.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45050 and previous config saved to /var/cache/conftool/dbconfig/20230306-143041-marostegui.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45049 and previous config saved to /var/cache/conftool/dbconfig/20230306-142910-marostegui.json
  • 14:25 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45048 and previous config saved to /var/cache/conftool/dbconfig/20230306-142039-marostegui.json
  • 14:16 sukhe: running authdns-update for CR 894652
  • 14:15 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45047 and previous config saved to /var/cache/conftool/dbconfig/20230306-141534-marostegui.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45046 and previous config saved to /var/cache/conftool/dbconfig/20230306-141404-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45045 and previous config saved to /var/cache/conftool/dbconfig/20230306-140533-marostegui.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45044 and previous config saved to /var/cache/conftool/dbconfig/20230306-140339-marostegui.json
  • 14:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45043 and previous config saved to /var/cache/conftool/dbconfig/20230306-140317-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45042 and previous config saved to /var/cache/conftool/dbconfig/20230306-134820-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45041 and previous config saved to /var/cache/conftool/dbconfig/20230306-134811-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1001.eqiad.wmnet,service=thanos-web
  • 13:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 13:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45040 and previous config saved to /var/cache/conftool/dbconfig/20230306-133451-marostegui.json
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-canary
  • 13:34 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45039 and previous config saved to /var/cache/conftool/dbconfig/20230306-133304-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45038 and previous config saved to /var/cache/conftool/dbconfig/20230306-131945-marostegui.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45037 and previous config saved to /var/cache/conftool/dbconfig/20230306-131758-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45036 and previous config saved to /var/cache/conftool/dbconfig/20230306-131545-marostegui.json
  • 13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45035 and previous config saved to /var/cache/conftool/dbconfig/20230306-131214-marostegui.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45034 and previous config saved to /var/cache/conftool/dbconfig/20230306-130933-marostegui.json
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 13:09 moritzm: rearmed keyholder on deploy1002 following reboot
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45033 and previous config saved to /var/cache/conftool/dbconfig/20230306-130854-marostegui.json
  • 13:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1002.eqiad.wmnet with OS bullseye
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45032 and previous config saved to /var/cache/conftool/dbconfig/20230306-130438-marostegui.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45031 and previous config saved to /var/cache/conftool/dbconfig/20230306-125707-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45030 and previous config saved to /var/cache/conftool/dbconfig/20230306-125348-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45029 and previous config saved to /var/cache/conftool/dbconfig/20230306-124932-marostegui.json
  • 12:48 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
  • 12:46 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45028 and previous config saved to /var/cache/conftool/dbconfig/20230306-124341-marostegui.json
  • 12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45027 and previous config saved to /var/cache/conftool/dbconfig/20230306-124308-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45026 and previous config saved to /var/cache/conftool/dbconfig/20230306-124200-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45025 and previous config saved to /var/cache/conftool/dbconfig/20230306-123841-marostegui.json
  • 12:32 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1002.eqiad.wmnet with OS bullseye
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45024 and previous config saved to /var/cache/conftool/dbconfig/20230306-122801-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45023 and previous config saved to /var/cache/conftool/dbconfig/20230306-122654-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45022 and previous config saved to /var/cache/conftool/dbconfig/20230306-122546-marostegui.json
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45021 and previous config saved to /var/cache/conftool/dbconfig/20230306-122524-marostegui.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45020 and previous config saved to /var/cache/conftool/dbconfig/20230306-122334-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45019 and previous config saved to /var/cache/conftool/dbconfig/20230306-121255-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45018 and previous config saved to /var/cache/conftool/dbconfig/20230306-121018-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45017 and previous config saved to /var/cache/conftool/dbconfig/20230306-120328-marostegui.json
  • 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45016 and previous config saved to /var/cache/conftool/dbconfig/20230306-115748-marostegui.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45015 and previous config saved to /var/cache/conftool/dbconfig/20230306-115511-marostegui.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45014 and previous config saved to /var/cache/conftool/dbconfig/20230306-115201-marostegui.json
  • 11:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45013 and previous config saved to /var/cache/conftool/dbconfig/20230306-115140-marostegui.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P45012 and previous config saved to /var/cache/conftool/dbconfig/20230306-114354-marostegui.json
  • 11:42 vgutierrez: enable ESI testing in cp4044 - T308799
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45011 and previous config saved to /var/cache/conftool/dbconfig/20230306-114004-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45010 and previous config saved to /var/cache/conftool/dbconfig/20230306-113856-marostegui.json
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45009 and previous config saved to /var/cache/conftool/dbconfig/20230306-113835-marostegui.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45008 and previous config saved to /var/cache/conftool/dbconfig/20230306-113633-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45007 and previous config saved to /var/cache/conftool/dbconfig/20230306-112847-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45006 and previous config saved to /var/cache/conftool/dbconfig/20230306-112328-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45005 and previous config saved to /var/cache/conftool/dbconfig/20230306-112126-marostegui.json
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45003 and previous config saved to /var/cache/conftool/dbconfig/20230306-111340-marostegui.json
  • 11:09 jbond: enable puppet fleet wide to post reboot puppetdb
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45002 and previous config saved to /var/cache/conftool/dbconfig/20230306-110822-marostegui.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45001 and previous config saved to /var/cache/conftool/dbconfig/20230306-110620-marostegui.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45000 and previous config saved to /var/cache/conftool/dbconfig/20230306-110031-marostegui.json
  • 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44999 and previous config saved to /var/cache/conftool/dbconfig/20230306-110009-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44998 and previous config saved to /var/cache/conftool/dbconfig/20230306-105834-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44997 and previous config saved to /var/cache/conftool/dbconfig/20230306-105315-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44996 and previous config saved to /var/cache/conftool/dbconfig/20230306-105206-marostegui.json
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44995 and previous config saved to /var/cache/conftool/dbconfig/20230306-105145-marostegui.json
  • 10:49 jbond: disable puppet fleet wide to reboot puppetdb
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44994 and previous config saved to /var/cache/conftool/dbconfig/20230306-104503-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44993 and previous config saved to /var/cache/conftool/dbconfig/20230306-103639-marostegui.json
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44992 and previous config saved to /var/cache/conftool/dbconfig/20230306-103525-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44991 and previous config saved to /var/cache/conftool/dbconfig/20230306-103503-marostegui.json
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44990 and previous config saved to /var/cache/conftool/dbconfig/20230306-102956-marostegui.json
  • 10:29 vgutierrez: enable haproxy systemd service unit hardening in cp4045 - T323944
  • 10:29 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1001.eqiad.wmnet with OS bullseye
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44989 and previous config saved to /var/cache/conftool/dbconfig/20230306-102132-marostegui.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44988 and previous config saved to /var/cache/conftool/dbconfig/20230306-101957-marostegui.json
  • 10:18 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:17 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44987 and previous config saved to /var/cache/conftool/dbconfig/20230306-101450-marostegui.json
  • 10:12 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:12 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44986 and previous config saved to /var/cache/conftool/dbconfig/20230306-100901-marostegui.json
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44985 and previous config saved to /var/cache/conftool/dbconfig/20230306-100840-marostegui.json
  • 10:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
  • 10:07 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44984 and previous config saved to /var/cache/conftool/dbconfig/20230306-100626-marostegui.json
  • 10:05 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44983 and previous config saved to /var/cache/conftool/dbconfig/20230306-100450-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44982 and previous config saved to /var/cache/conftool/dbconfig/20230306-100417-marostegui.json
  • 10:04 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44981 and previous config saved to /var/cache/conftool/dbconfig/20230306-100356-marostegui.json
  • 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host deploy1002.eqiad.wmnet
  • 09:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44980 and previous config saved to /var/cache/conftool/dbconfig/20230306-095333-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44979 and previous config saved to /var/cache/conftool/dbconfig/20230306-094944-marostegui.json
  • 09:49 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:49 nfraison@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44978 and previous config saved to /var/cache/conftool/dbconfig/20230306-094849-marostegui.json
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44977 and previous config saved to /var/cache/conftool/dbconfig/20230306-094341-root.json
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44976 and previous config saved to /var/cache/conftool/dbconfig/20230306-093827-marostegui.json
  • 09:36 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44975 and previous config saved to /var/cache/conftool/dbconfig/20230306-093343-marostegui.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44974 and previous config saved to /var/cache/conftool/dbconfig/20230306-092836-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44973 and previous config saved to /var/cache/conftool/dbconfig/20230306-092557-marostegui.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44972 and previous config saved to /var/cache/conftool/dbconfig/20230306-092536-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44971 and previous config saved to /var/cache/conftool/dbconfig/20230306-092320-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44970 and previous config saved to /var/cache/conftool/dbconfig/20230306-091836-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44969 and previous config saved to /var/cache/conftool/dbconfig/20230306-091733-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44968 and previous config saved to /var/cache/conftool/dbconfig/20230306-091728-marostegui.json
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44967 and previous config saved to /var/cache/conftool/dbconfig/20230306-091706-marostegui.json
  • 09:14 dcausse: depooling & restarting blazegraph on wdqs1006 (stuck for 48+ hours)
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44966 and previous config saved to /var/cache/conftool/dbconfig/20230306-091330-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44965 and previous config saved to /var/cache/conftool/dbconfig/20230306-091030-marostegui.json
  • 09:06 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001 (duration: 00m 12s)
  • 09:06 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001
  • 09:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44964 and previous config saved to /var/cache/conftool/dbconfig/20230306-090416-marostegui.json
  • 09:02 vgutierrez: disabling haproxy systemd service unit hardening in ulsfo - T323944
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44963 and previous config saved to /var/cache/conftool/dbconfig/20230306-090200-marostegui.json
  • 09:00 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002 (duration: 00m 07s)
  • 09:00 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44962 and previous config saved to /var/cache/conftool/dbconfig/20230306-085825-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44961 and previous config saved to /var/cache/conftool/dbconfig/20230306-085523-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44960 and previous config saved to /var/cache/conftool/dbconfig/20230306-084910-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44959 and previous config saved to /var/cache/conftool/dbconfig/20230306-084653-marostegui.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44958 and previous config saved to /var/cache/conftool/dbconfig/20230306-084320-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44957 and previous config saved to /var/cache/conftool/dbconfig/20230306-084017-marostegui.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44956 and previous config saved to /var/cache/conftool/dbconfig/20230306-083403-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44955 and previous config saved to /var/cache/conftool/dbconfig/20230306-083147-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44954 and previous config saved to /var/cache/conftool/dbconfig/20230306-083038-marostegui.json
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:28 moritzm: rolling restart of Apache on mw* to pick up apr-util security updates
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44953 and previous config saved to /var/cache/conftool/dbconfig/20230306-082815-root.json
  • 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44952 and previous config saved to /var/cache/conftool/dbconfig/20230306-082645-marostegui.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 08:22 kartik@deploy2002: Finished scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) (duration: 19m 12s)
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44951 and previous config saved to /var/cache/conftool/dbconfig/20230306-081857-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44950 and previous config saved to /var/cache/conftool/dbconfig/20230306-081711-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44949 and previous config saved to /var/cache/conftool/dbconfig/20230306-081639-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44948 and previous config saved to /var/cache/conftool/dbconfig/20230306-081310-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44947 and previous config saved to /var/cache/conftool/dbconfig/20230306-081305-marostegui.json
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44946 and previous config saved to /var/cache/conftool/dbconfig/20230306-081244-marostegui.json
  • 08:12 kartik@deploy2002: kartik: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44945 and previous config saved to /var/cache/conftool/dbconfig/20230306-081138-marostegui.json
  • 08:02 kartik@deploy2002: Started scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482)
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44944 and previous config saved to /var/cache/conftool/dbconfig/20230306-080132-marostegui.json
  • 08:00 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44943 and previous config saved to /var/cache/conftool/dbconfig/20230306-075737-marostegui.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44942 and previous config saved to /var/cache/conftool/dbconfig/20230306-075632-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122', diff saved to https://phabricator.wikimedia.org/P44941 and previous config saved to /var/cache/conftool/dbconfig/20230306-074830-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44940 and previous config saved to /var/cache/conftool/dbconfig/20230306-074626-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44939 and previous config saved to /var/cache/conftool/dbconfig/20230306-074231-marostegui.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44938 and previous config saved to /var/cache/conftool/dbconfig/20230306-074125-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44937 and previous config saved to /var/cache/conftool/dbconfig/20230306-073707-marostegui.json
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44936 and previous config saved to /var/cache/conftool/dbconfig/20230306-073119-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44935 and previous config saved to /var/cache/conftool/dbconfig/20230306-072724-marostegui.json
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2094.codfw.wmnet
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:22 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44934 and previous config saved to /var/cache/conftool/dbconfig/20230306-072132-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2094.codfw.wmnet
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44933 and previous config saved to /var/cache/conftool/dbconfig/20230306-070814-marostegui.json
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:29 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/public to dumpsdata1007, no bandwidth cap
  • 06:03 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/private to dumpsdata1007 (did this for 1006 about an hour ago, forgot to log), no bandwidth cap

2023-03-04

  • 14:56 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 02m 17s)
  • 14:53 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
  • 14:44 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 08m 56s)
  • 14:35 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
  • 14:32 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 00m 46s)
  • 14:31 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: (no justification provided)
  • 06:09 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1006, no bandwidth cap

2023-03-03

  • 20:58 inflatador: bking@cumin2002 persistently unban all elastic nodes in eqiad T322082
  • 20:55 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
  • 20:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2070.codfw.wmnet with OS bullseye
  • 20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:33 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
  • 20:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:05 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
  • 19:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:40 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:39 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:36 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 19:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 19:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 18:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
  • 18:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
  • 18:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2070.codfw.wmnet with OS bullseye
  • 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
  • 18:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:17 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:16 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
  • 18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 krinkle@deploy2002: Synchronized wmf-config/mc.php: Ic55725: Prepare mc.php for next week train (duration: 07m 39s)
  • 17:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
  • 17:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 17:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
  • 17:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
  • 17:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:01 inflatador: bking@cumin2002 ban elastic1059-1066 T322082
  • 16:56 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 16:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:44 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
  • 16:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
  • 16:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:38 bking@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:37 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
  • 16:36 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 16:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
  • 16:09 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
  • 15:53 mforns@deploy2002: Finished deploy [airflow-dags/analytics@ad17aa9]: (no justification provided) (duration: 00m 22s)
  • 15:53 mforns@deploy2002: Started deploy [airflow-dags/analytics@ad17aa9]: (no justification provided)
  • 15:47 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:43 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance (duration: 00m 21s)
  • 15:42 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance
  • 15:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:36 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:33 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:32 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:32 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:26 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:24 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:24 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:21 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1004.wikimedia.org
  • 15:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1004.wikimedia.org on all recursors
  • 15:02 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1004.wikimedia.org on all recursors
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
  • 14:59 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 14:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
  • 14:56 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1004.wikimedia.org
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1003.wikimedia.org
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1003.wikimedia.org on all recursors
  • 14:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1003.wikimedia.org on all recursors
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: rerack
  • 14:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: rerack
  • 14:24 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1003.wikimedia.org
  • 14:09 inflatador: bking@cumin2002 banning elastic1053-59 from the cluster in preparation for T322082
  • 14:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:51 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 13:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
  • 13:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:29 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:13 moritzm: imported PHP 7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u1 to component/icu67 (build of PHP against co-installable ICU67) T329491
  • 10:39 vgutierrez: restart ntp.service in dns2001
  • 10:30 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 10:25 moritzm: installing 5.10.162 kernels on buster systems running Linux 5.10
  • 10:12 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
  • 10:12 root@cumin2002: START - Cookbook sre.idm.logout Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
  • 09:56 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 1119 hosts
  • 09:55 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 1119 hosts
  • 09:54 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 909 hosts
  • 09:54 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 909 hosts
  • 09:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 09:45 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 09:10 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:10 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:07 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:01 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:54 elukey: restart pybal on lvs2010 (standby) and then on lvs2009 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
  • 08:48 elukey: restart pybal on lvs1020 (standby) and then on lvs1019 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
  • 08:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:36 vgutierrez: restarting ntp in dns1001
  • 07:29 elukey: truncate /var/log/auth.log.1 on krb1001 to free space (root partition almost filled up)
  • 01:12 mutante: releases1002: deleting /usr/local/sbin/sync-srv-org-wikimedia-reprepro-releases1002.eqiad.wmnet which confusingly contains an rsync command to rsync from releases1001 which does not exist anymore T330960
  • 00:13 mutante: switching releases.wikimedia.org from eqiad to codfw - T330960

2023-03-02

  • 23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2001-2003].codfw.wmnet
  • 23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 22:45 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 22:37 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 22:11 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts wdqs[2001-2003].codfw.wmnet
  • 21:22 TheresNoTime: close UTC late backport and config training
  • 21:10 samtar@deploy2002: Finished scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) (duration: 08m 03s)
  • 21:04 samtar@deploy2002: superpes and samtar: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2001.wikimedia.org with OS bullseye
  • 21:02 samtar@deploy2002: Started scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051)
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1001.wikimedia.org with OS bullseye
  • 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
  • 20:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 20:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 20:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2001.wikimedia.org with OS bullseye
  • 20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
  • 20:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
  • 20:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 20:08 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1001.wikimedia.org with OS bullseye
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2014.codfw.wmnet with OS bullseye
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 19:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 19:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2014.codfw.wmnet with OS bullseye
  • 18:10 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:08 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:09 oblivian@deploy2002: Finished scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) (duration: 09m 16s)
  • 17:01 oblivian@deploy2002: oblivian: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:59 oblivian@deploy2002: Started scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942)
  • 15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
  • 15:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
  • 15:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:41 jynus: restart db2099 T330218
  • 14:32 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:29 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove unused Wikibase config variables (T330410) (duration: 08m 41s)
  • 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Remove unused Wikibase config variables (T330410) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove unused Wikibase config variables (T330410)
  • 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1010.eqiad.wmnet with OS bullseye
  • 13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:40 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:48 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:48 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:47 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:47 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:46 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:46 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:00 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:42 claime: Running authdns-update for 893675
  • 10:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 10:16 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test (duration: 00m 12s)
  • 10:16 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test
  • 10:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 10:05 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 10:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:50 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 09:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
  • 09:47 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 09:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
  • 09:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 09:28 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1010.eqiad.wmnet with OS bullseye
  • 09:20 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.25 refs T325588
  • 09:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1010']
  • 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 09:06 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
  • 09:04 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1010']
  • 08:58 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
  • 08:58 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1010
  • 08:57 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1010
  • 08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
  • 08:46 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
  • 08:39 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 08:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:34 marostegui: Stop MySQL on db2093 T330827
  • 08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 08:18 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 08:15 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
  • 08:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 08:05 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 07:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 07:48 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:48 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:48 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:47 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:38 apergos: started rsync of xmldatadumps/private from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
  • 07:38 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:36 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 07:17 marostegui: Stop MySQL on db2095 T330975
  • 01:23 mutante: doc2001 - stopping apache2 to test alerting - active server is doc1002 but should be switched T327973 T330963
  • 01:08 mutante: releases2002 - stopping apache2 to test alerting (active server is 1002 but should be switched) T327975 T330960
  • 00:28 mutante: planet1002 - stopping apache2 to test alerting (active host is codfw)

2023-03-01

  • 23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1002.wikimedia.org with OS bullseye
  • 23:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
  • 22:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
  • 22:52 mutante: apt1001 - systemctl reset-failed T328907
  • 22:45 mforns@deploy2002: Finished deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided) (duration: 00m 23s)
  • 22:45 mforns@deploy2002: Started deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided)
  • 22:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1002.wikimedia.org with OS bullseye
  • 22:42 mforns@deploy2002: Finished deploy [airflow-dags/analytics@51e92b1]: (no justification provided) (duration: 00m 21s)
  • 22:42 mforns@deploy2002: Started deploy [airflow-dags/analytics@51e92b1]: (no justification provided)
  • 21:41 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a] (duration: 01m 22s)
  • 21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a]
  • 21:39 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a] (duration: 00m 07s)
  • 21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a]
  • 21:38 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a] (duration: 10m 55s)
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2002.wikimedia.org with OS bullseye
  • 21:27 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a]
  • 21:23 TheresNoTime: closing UTC late backport window
  • 21:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
  • 21:16 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
  • 21:11 samtar@deploy2002: Finished scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) (duration: 09m 30s)
  • 21:04 samtar@deploy2002: superpes and samtar: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:02 samtar@deploy2002: Started scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047)
  • 21:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2002.wikimedia.org with OS bullseye
  • 20:43 zabe: move rev_comment_id migration screens from mwmaint1002 to mwmaint2002 # T275246
  • 19:47 brett: re-adding dns3001 to next-hop routing via juniper - T321309
  • 19:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3001.wikimedia.org with OS bullseye
  • 19:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
  • 19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
  • 18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3001.wikimedia.org with OS bullseye
  • 18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS buster
  • 17:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
  • 17:41 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
  • 17:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test (duration: 00m 05s)
  • 17:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test
  • 17:26 root@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 17:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 17:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 17:06 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 17:05 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 16:56 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 16:28 brett: Remove dns3001 DNS request routing via juniper - T321309
  • 16:28 XioNoX: rollback port 80 block in esams - T330683
  • 16:26 taavi@deploy2002: Finished scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) (duration: 08m 23s)
  • 16:21 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:20 taavi@deploy2002: taavi: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 16:19 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:19 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:18 taavi@deploy2002: Started scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031)
  • 16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 16:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 16:02 bblack: cr[23]-esams: manually adding brett's ssh-rsa to match https://gerrit.wikimedia.org/r/c/operations/homer/public/+/892551
  • 16:01 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
  • 16:00 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 15:57 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:57 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:44 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:39 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:35 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:32 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 15:28 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:22 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-canary
  • 15:18 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-canary
  • 15:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:11 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 15:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 15:06 hashar: Restarting Apache on Gerrit host
  • 15:04 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:57 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 14:52 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1005
  • 14:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 14:45 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1005
  • 14:34 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:33 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 14:32 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-canary
  • 14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 14:29 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-canary
  • 14:27 taavi: re-start persistRevisionThreadItems.php on itwiki from P44912 after DC switchover T315510
  • 14:27 claime: End mediawiki datacenter switchover - T327920
  • 14:26 cgoubert@deploy2002: Finished scap: Backport for debug.json: List primary DC servers first (T327920) (duration: 07m 54s)
  • 14:20 cgoubert@deploy2002: cgoubert: Backport for debug.json: List primary DC servers first (T327920) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:18 cgoubert@deploy2002: Started scap: Backport for debug.json: List primary DC servers first (T327920)
  • 14:16 claime: Removing scap lock - T327920
  • 14:15 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2122 weight', diff saved to https://phabricator.wikimedia.org/P44913 and previous config saved to /var/cache/conftool/dbconfig/20230301-141414-marostegui.json
  • 14:10 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 14:09 claime: Phase 9.5 DNS records for new database masters updated - T327920
  • 14:08 claime: Phase 9.5 Update DNS records for new database masters - T327920
  • 14:07 taavi: test
  • 14:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 14:05 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 14:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:02 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:02 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-03-01 14:02:09.272468
  • 14:00 cgoubert@cumin1001: MediaWiki read-only period starts at: 2023-03-01 14:00:10.075167
  • 14:00 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:56 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:52 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:51 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:51 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:49 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:49 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
  • 13:40 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
  • 13:40 claime: Starting mediawiki datacenter switchover step 0 - T327920
  • 13:37 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 claime: Locking scap deployments for datacenter switchover - T327920
  • 13:30 krinkle@deploy2002: Synchronized wmf-config/: I3beefb filebackend cleanup (duration: 07m 13s)
  • 13:19 krinkle@deploy2002: Synchronized wmf-config/: Ie063fb - Remove config for former Rdbms logging (duration: 07m 39s)
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 13:17 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 13:10 claime: Adding scheduled maintenance for switchover to statuspage - T327920
  • 13:09 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 12:40 marostegui: Upgrade db2183 to 10.6 T330861
  • 12:28 moritzm: upgrade mwmaint to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 11:58 moritzm: upgrade parse/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 11:09 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:08 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:07 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1010.eqiad.wmnet
  • 11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:00 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:57 moritzm: upgrade cloudweb to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 10:56 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 10:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 10:32 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 10:30 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 10:25 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 10:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:11 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:03 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 10:02 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1010.eqiad.wmnet
  • 09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 09:58 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 09:57 marostegui: Stop db1117:3325 and db1176 T329478
  • 09:57 root@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8309
  • 09:47 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8309
  • 09:39 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad
  • 09:38 moritzm: installing tiff security updates
  • 09:31 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro
  • 09:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 09:30 jnuche@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.25 refs T325588 (duration: 07m 48s)
  • 09:26 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 09:23 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.25 refs T325588
  • 09:15 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:15 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:58 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:56 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:51 moritzm: upgrade mw/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 08:45 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:42 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1003.eqiad.wmnet with OS bullseye
  • 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1002.eqiad.wmnet with OS bullseye
  • 08:40 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1001.eqiad.wmnet with OS bullseye
  • 08:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 918 hosts
  • 08:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 918 hosts
  • 08:35 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 1110 hosts
  • 08:34 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 1110 hosts
  • 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
  • 08:26 jynus: stopping db2184 for testing mariadb 10.6 recovery workflow T319383
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
  • 08:15 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
  • 08:14 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1001.eqiad.wmnet with OS bullseye
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1002.eqiad.wmnet with OS bullseye
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1003.eqiad.wmnet with OS bullseye
  • 08:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: T330758
  • 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: T330758
  • 06:14 marostegui: Stop MySQL on db2094 T330828
  • 05:37 marostegui: Stop mysql on codfw sanitarium host db2095 (s2, s7, s6, s4) to clone db2187 T326596
  • 05:37 eileen: civicrm upgraded from ffc16d2d to fe2c06f6
  • 00:25 ejegg: civicrm rolled back from d199694e to ffc16d2d
  • 00:06 zabe@deploy2002: Finished scap: T198673 (duration: 07m 25s)

Other archives

2000s

2010s

2020s