Jump to content

Server Admin Log/Archive 73

From Wikitech


2023-11-30

  • 23:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2097.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2099.codfw.wmnet with OS bookworm
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2096.codfw.wmnet with OS bookworm
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2097.codfw.wmnet with reason: host reimage
  • 23:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1105.eqiad.wmnet with reason: host reimage
  • 23:50 krinkle@deploy2002: Synchronized docroot/noc/: (no justification provided) (duration: 08m 28s)
  • 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 23:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1105.eqiad.wmnet with reason: host reimage
  • 23:45 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2098.codfw.wmnet with OS bookworm
  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54056 and previous config saved to /var/cache/conftool/dbconfig/20231130-234322-arnaudb.json
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2097.codfw.wmnet with OS bookworm
  • 23:35 foks: removing 1 file for legal compliance
  • 23:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2096.codfw.wmnet with reason: host reimage
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2096.codfw.wmnet with reason: host reimage
  • 23:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 23:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 23:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P54055 and previous config saved to /var/cache/conftool/dbconfig/20231130-232815-arnaudb.json
  • 23:18 foks: removing 1 file for legal compliance
  • 23:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bookworm
  • 23:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P54054 and previous config saved to /var/cache/conftool/dbconfig/20231130-231309-arnaudb.json
  • 23:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2096.codfw.wmnet with OS bookworm
  • 23:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2094.codfw.wmnet with OS bookworm
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2095.codfw.wmnet with OS bookworm
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 foks: removing 5 files for legal compliance
  • 22:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54053 and previous config saved to /var/cache/conftool/dbconfig/20231130-225802-arnaudb.json
  • 22:46 wfan: payments-wiki upgraded from 7feabffe to b37ab50e
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2095.codfw.wmnet with reason: host reimage
  • 22:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2095.codfw.wmnet with reason: host reimage
  • 22:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54051 and previous config saved to /var/cache/conftool/dbconfig/20231130-222836-arnaudb.json
  • 22:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54050 and previous config saved to /var/cache/conftool/dbconfig/20231130-222814-arnaudb.json
  • 22:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2095.codfw.wmnet with OS bookworm
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2093.codfw.wmnet with OS bookworm
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P54048 and previous config saved to /var/cache/conftool/dbconfig/20231130-221308-arnaudb.json
  • 22:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P54047 and previous config saved to /var/cache/conftool/dbconfig/20231130-215759-arnaudb.json
  • 21:55 dancy@deploy2002: Finished scap: Backport for Increase "large" font-size option for client-preferences (T351693) (duration: 10m 01s)
  • 21:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 21:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 21:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2093.codfw.wmnet with reason: host reimage
  • 21:49 dancy@deploy2002: jdrewniak and dancy: Continuing with sync
  • 21:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2093.codfw.wmnet with reason: host reimage
  • 21:46 dancy@deploy2002: jdrewniak and dancy: Backport for Increase "large" font-size option for client-preferences (T351693) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bookworm
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2092.codfw.wmnet with OS bookworm
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:45 dancy@deploy2002: Started scap: Backport for Increase "large" font-size option for client-preferences (T351693)
  • 21:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54046 and previous config saved to /var/cache/conftool/dbconfig/20231130-214252-arnaudb.json
  • 21:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2093.codfw.wmnet with OS bookworm
  • 21:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2092.codfw.wmnet with reason: host reimage
  • 21:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2092.codfw.wmnet with reason: host reimage
  • 21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1105.eqiad.wmnet with OS bookworm
  • 21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
  • 21:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54045 and previous config saved to /var/cache/conftool/dbconfig/20231130-211412-arnaudb.json
  • 21:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 21:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54044 and previous config saved to /var/cache/conftool/dbconfig/20231130-211349-arnaudb.json
  • 21:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2092.codfw.wmnet with OS bookworm
  • 20:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P54043 and previous config saved to /var/cache/conftool/dbconfig/20231130-205843-arnaudb.json
  • 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P54042 and previous config saved to /var/cache/conftool/dbconfig/20231130-204336-arnaudb.json
  • 20:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1104.eqiad.wmnet with OS bookworm
  • 20:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1106.eqiad.wmnet with OS bookworm
  • 20:37 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1103.eqiad.wmnet with OS bookworm
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54041 and previous config saved to /var/cache/conftool/dbconfig/20231130-202830-arnaudb.json
  • 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1103.eqiad.wmnet with reason: host reimage
  • 20:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1104.eqiad.wmnet with reason: host reimage
  • 20:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1106.eqiad.wmnet with reason: host reimage
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1106.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1103.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1104.eqiad.wmnet with reason: host reimage
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54040 and previous config saved to /var/cache/conftool/dbconfig/20231130-200409-arnaudb.json
  • 20:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54039 and previous config saved to /var/cache/conftool/dbconfig/20231130-200342-arnaudb.json
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1106.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1103.eqiad.wmnet with OS bookworm
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1104.eqiad.wmnet with OS bookworm
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1104']
  • 19:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P54037 and previous config saved to /var/cache/conftool/dbconfig/20231130-194835-arnaudb.json
  • 19:41 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1038']
  • 19:37 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1037']
  • 19:37 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1036']
  • 19:34 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1035']
  • 19:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2028.codfw.wmnet
  • 19:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P54036 and previous config saved to /var/cache/conftool/dbconfig/20231130-193329-arnaudb.json
  • 19:33 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1038']
  • 19:31 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1037']
  • 19:30 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1036']
  • 19:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 19:29 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1035']
  • 19:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:28 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti1035']
  • 19:28 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1035']
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1103']
  • 19:25 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2028.codfw.wmnet
  • 19:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:22 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1106']
  • 19:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:21 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 19:21 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1105']
  • 19:20 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:20 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 19:20 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1105']
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1106']
  • 19:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54035 and previous config saved to /var/cache/conftool/dbconfig/20231130-191822-arnaudb.json
  • 19:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 19:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1103']
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1104']
  • 19:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 19:09 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 18:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:50 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:49 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 18:49 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 18:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54034 and previous config saved to /var/cache/conftool/dbconfig/20231130-184900-arnaudb.json
  • 18:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 18:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:22 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 18:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 18:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54033 and previous config saved to /var/cache/conftool/dbconfig/20231130-182155-arnaudb.json
  • 18:15 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:14 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:12 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P54032 and previous config saved to /var/cache/conftool/dbconfig/20231130-180648-arnaudb.json
  • 18:02 mutante: planet2003 - revoking old puppet cert, following the "fix forward" steps from T349619 - puppet running again
  • 17:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P54031 and previous config saved to /var/cache/conftool/dbconfig/20231130-175141-arnaudb.json
  • 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54030 and previous config saved to /var/cache/conftool/dbconfig/20231130-173635-arnaudb.json
  • 17:27 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:26 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:23 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:14 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54029 and previous config saved to /var/cache/conftool/dbconfig/20231130-170713-arnaudb.json
  • 17:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54028 and previous config saved to /var/cache/conftool/dbconfig/20231130-170650-arnaudb.json
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ips to restbase servers in codfw - jhancock@cumin2002"
  • 17:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ips to restbase servers in codfw - jhancock@cumin2002"
  • 16:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P54027 and previous config saved to /var/cache/conftool/dbconfig/20231130-165144-arnaudb.json
  • 16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P54026 and previous config saved to /var/cache/conftool/dbconfig/20231130-163637-arnaudb.json
  • 16:33 ladsgroup@deploy2002: Finished scap: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444) (duration: 09m 45s)
  • 16:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:26 ladsgroup@deploy2002: ladsgroup: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:23 ladsgroup@deploy2002: Started scap: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444)
  • 16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54025 and previous config saved to /var/cache/conftool/dbconfig/20231130-162131-arnaudb.json
  • 15:54 moritzm: installing stunnel4 bugfix updates from bookworm point release
  • 15:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54024 and previous config saved to /var/cache/conftool/dbconfig/20231130-155251-arnaudb.json
  • 15:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54023 and previous config saved to /var/cache/conftool/dbconfig/20231130-154227-arnaudb.json
  • 15:36 moritzm: installing minizip security updates
  • 15:33 sukhe: clean-up /etc/hosts on A:dns-rec to remove entries populated by host_core: T347054
  • 15:31 moritzm: installing dbus security updates on buster
  • 15:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P54022 and previous config saved to /var/cache/conftool/dbconfig/20231130-152721-arnaudb.json
  • 15:21 moritzm: installing libbsd bugfix updates from Bullseye point release
  • 15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P54021 and previous config saved to /var/cache/conftool/dbconfig/20231130-151214-arnaudb.json
  • 15:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 15:07 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'change es3 master back to es2034', diff saved to https://phabricator.wikimedia.org/P54020 and previous config saved to /var/cache/conftool/dbconfig/20231130-150712-arnaudb.json
  • 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54019 and previous config saved to /var/cache/conftool/dbconfig/20231130-150434-arnaudb.json
  • 14:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54018 and previous config saved to /var/cache/conftool/dbconfig/20231130-145707-arnaudb.json
  • 14:54 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1106']
  • 14:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 14:53 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1105']
  • 14:50 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 14:50 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1105']
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54017 and previous config saved to /var/cache/conftool/dbconfig/20231130-144929-arnaudb.json
  • 14:49 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54016 and previous config saved to /var/cache/conftool/dbconfig/20231130-144854-arnaudb.json
  • 14:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54015 and previous config saved to /var/cache/conftool/dbconfig/20231130-144831-arnaudb.json
  • 14:48 godog: roll-restart prometheus/ops in eqiad/codfw to apply new size-based retention - T351179
  • 14:46 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:36 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54013 and previous config saved to /var/cache/conftool/dbconfig/20231130-143424-arnaudb.json
  • 14:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P54012 and previous config saved to /var/cache/conftool/dbconfig/20231130-143325-arnaudb.json
  • 14:31 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:29 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
  • 14:27 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 14:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 14:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1107
  • 14:26 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1107
  • 14:25 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1106
  • 14:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1106
  • 14:24 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1105
  • 14:24 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1105
  • 14:23 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1103
  • 14:23 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 14:23 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:22 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1103
  • 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
  • 14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54011 and previous config saved to /var/cache/conftool/dbconfig/20231130-141919-arnaudb.json
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P54010 and previous config saved to /var/cache/conftool/dbconfig/20231130-141815-arnaudb.json
  • 14:15 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host planet2003.codfw.wmnet
  • 14:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host planet2003.codfw.wmnet
  • 14:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM stewards2001.codfw.wmnet
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54009 and previous config saved to /var/cache/conftool/dbconfig/20231130-140414-arnaudb.json
  • 14:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM stewards2001.codfw.wmnet
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54008 and previous config saved to /var/cache/conftool/dbconfig/20231130-140308-arnaudb.json
  • 13:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54007 and previous config saved to /var/cache/conftool/dbconfig/20231130-135453-arnaudb.json
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P54006 and previous config saved to /var/cache/conftool/dbconfig/20231130-135410-arnaudb.json
  • 13:49 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54005 and previous config saved to /var/cache/conftool/dbconfig/20231130-134909-arnaudb.json
  • 13:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P54004 and previous config saved to /var/cache/conftool/dbconfig/20231130-133904-arnaudb.json
  • 13:34 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54003 and previous config saved to /var/cache/conftool/dbconfig/20231130-133404-arnaudb.json
  • 13:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P54002 and previous config saved to /var/cache/conftool/dbconfig/20231130-132357-arnaudb.json
  • 13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54001 and previous config saved to /var/cache/conftool/dbconfig/20231130-131859-arnaudb.json
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1126.eqiad.wmnet
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1126.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 13:11 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1126.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 13:09 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P54000 and previous config saved to /var/cache/conftool/dbconfig/20231130-130851-arnaudb.json
  • 13:06 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:05 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host schema2004.codfw.wmnet
  • 13:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53999 and previous config saved to /var/cache/conftool/dbconfig/20231130-130354-arnaudb.json
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1126.eqiad.wmnet
  • 13:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P53998 and previous config saved to /var/cache/conftool/dbconfig/20231130-130136-arnaudb.json
  • 13:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53997 and previous config saved to /var/cache/conftool/dbconfig/20231130-130113-arnaudb.json
  • 12:59 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 12:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host schema2004.codfw.wmnet
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53996 and previous config saved to /var/cache/conftool/dbconfig/20231130-124849-arnaudb.json
  • 12:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P53995 and previous config saved to /var/cache/conftool/dbconfig/20231130-124607-arnaudb.json
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 is depooled', diff saved to https://phabricator.wikimedia.org/P53994 and previous config saved to /var/cache/conftool/dbconfig/20231130-124110-arnaudb.json
  • 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'change es3 master to es2029 as es2034 will reboot', diff saved to https://phabricator.wikimedia.org/P53993 and previous config saved to /var/cache/conftool/dbconfig/20231130-124050-arnaudb.json
  • 12:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reboot
  • 12:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reboot
  • 12:37 arnaudb@cumin1001: dbctl commit (dc=all): 'change es2 master to es2033 after reboot', diff saved to https://phabricator.wikimedia.org/P53992 and previous config saved to /var/cache/conftool/dbconfig/20231130-123752-arnaudb.json
  • 12:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P53991 and previous config saved to /var/cache/conftool/dbconfig/20231130-123100-arnaudb.json
  • 12:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2059.codfw.wmnet
  • 12:26 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2059.codfw.wmnet
  • 12:24 claime: Uncordoning kubernetes20(5[4789]|60).codfw.wmnet - T352369
  • 12:22 claime: Pooling kubernetes20(5[4789]|60).codfw.wmnet - T352369
  • 12:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53990 and previous config saved to /var/cache/conftool/dbconfig/20231130-121554-arnaudb.json
  • 12:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53989 and previous config saved to /var/cache/conftool/dbconfig/20231130-120911-arnaudb.json
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53988 and previous config saved to /var/cache/conftool/dbconfig/20231130-120841-arnaudb.json
  • 12:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53987 and previous config saved to /var/cache/conftool/dbconfig/20231130-120819-arnaudb.json
  • 12:06 claime: Running homer 'cr*codfw*' commit T352369
  • 12:03 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:03 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2059.codfw.wmnet with OS bullseye
  • 11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53986 and previous config saved to /var/cache/conftool/dbconfig/20231130-115406-arnaudb.json
  • 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P53985 and previous config saved to /var/cache/conftool/dbconfig/20231130-115312-arnaudb.json
  • 11:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2060.codfw.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2058.codfw.wmnet with OS bullseye
  • 11:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53984 and previous config saved to /var/cache/conftool/dbconfig/20231130-113901-arnaudb.json
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P53983 and previous config saved to /var/cache/conftool/dbconfig/20231130-113804-arnaudb.json
  • 11:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2057.codfw.wmnet with OS bullseye
  • 11:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53982 and previous config saved to /var/cache/conftool/dbconfig/20231130-112356-arnaudb.json
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53981 and previous config saved to /var/cache/conftool/dbconfig/20231130-112258-arnaudb.json
  • 11:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 11:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 11:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53980 and previous config saved to /var/cache/conftool/dbconfig/20231130-111546-arnaudb.json
  • 11:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53979 and previous config saved to /var/cache/conftool/dbconfig/20231130-111524-arnaudb.json
  • 11:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 11:11 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53978 and previous config saved to /var/cache/conftool/dbconfig/20231130-110851-arnaudb.json
  • 11:01 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2060.codfw.wmnet with OS bullseye
  • 11:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P53977 and previous config saved to /var/cache/conftool/dbconfig/20231130-110017-arnaudb.json
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2059.codfw.wmnet with OS bullseye
  • 10:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2058.codfw.wmnet with OS bullseye
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53976 and previous config saved to /var/cache/conftool/dbconfig/20231130-105346-arnaudb.json
  • 10:52 moritzm: installing python-git security updates
  • 10:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2057.codfw.wmnet with OS bullseye
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P53975 and previous config saved to /var/cache/conftool/dbconfig/20231130-104510-arnaudb.json
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53974 and previous config saved to /var/cache/conftool/dbconfig/20231130-103841-arnaudb.json
  • 10:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53973 and previous config saved to /var/cache/conftool/dbconfig/20231130-103004-arnaudb.json
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53972 and previous config saved to /var/cache/conftool/dbconfig/20231130-102336-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53971 and previous config saved to /var/cache/conftool/dbconfig/20231130-102255-arnaudb.json
  • 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53970 and previous config saved to /var/cache/conftool/dbconfig/20231130-100830-arnaudb.json
  • 10:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:59 vgutierrez: rolling restart of pybal on lvs2011 and lvs2014, effectively enabling IPIP encapsulation on ncredir@codfw - T351069
  • 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53969 and previous config saved to /var/cache/conftool/dbconfig/20231130-095325-arnaudb.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir3003.esams.wmnet
  • 09:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir3003.esams.wmnet
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 is depooled', diff saved to https://phabricator.wikimedia.org/P53968 and previous config saved to /var/cache/conftool/dbconfig/20231130-093814-arnaudb.json
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'change es2 master to es2026 as es2033 is rebooting', diff saved to https://phabricator.wikimedia.org/P53967 and previous config saved to /var/cache/conftool/dbconfig/20231130-093740-arnaudb.json
  • 09:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:35 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3003.esams.wmnet
  • 09:33 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3003.esams.wmnet
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:19 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
  • 09:15 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.7 refs T350083
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53966 and previous config saved to /var/cache/conftool/dbconfig/20231130-090242-root.json
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
  • 08:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 08:55 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 08:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install6002.wikimedia.org
  • 08:52 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53965 and previous config saved to /var/cache/conftool/dbconfig/20231130-084737-root.json
  • 08:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install6002.wikimedia.org
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1126 from dbctl T352362', diff saved to https://phabricator.wikimedia.org/P53964 and previous config saved to /var/cache/conftool/dbconfig/20231130-084655-marostegui.json
  • 08:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 08:44 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T352362', diff saved to https://phabricator.wikimedia.org/P53963 and previous config saved to /var/cache/conftool/dbconfig/20231130-084015-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53962 and previous config saved to /var/cache/conftool/dbconfig/20231130-083232-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53961 and previous config saved to /var/cache/conftool/dbconfig/20231130-083231-root.json
  • 08:28 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53960 and previous config saved to /var/cache/conftool/dbconfig/20231130-081727-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53959 and previous config saved to /var/cache/conftool/dbconfig/20231130-081726-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53958 and previous config saved to /var/cache/conftool/dbconfig/20231130-080222-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53957 and previous config saved to /var/cache/conftool/dbconfig/20231130-080220-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53956 and previous config saved to /var/cache/conftool/dbconfig/20231130-074717-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53955 and previous config saved to /var/cache/conftool/dbconfig/20231130-074715-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53954 and previous config saved to /var/cache/conftool/dbconfig/20231130-073212-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53953 and previous config saved to /var/cache/conftool/dbconfig/20231130-073210-root.json
  • 07:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bookworm
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS bookworm
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 06:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 06:45 kart_: Updated Apertium to 2023-11-30-061450-production (T270060)
  • 06:44 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 06:43 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 06:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS bookworm
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bookworm
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210 T351283', diff saved to https://phabricator.wikimedia.org/P53952 and previous config saved to /var/cache/conftool/dbconfig/20231130-063317-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T351283', diff saved to https://phabricator.wikimedia.org/P53951 and previous config saved to /var/cache/conftool/dbconfig/20231130-063258-root.json
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1159.eqiad.wmnet with OS bookworm
  • 06:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: host reimage
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: host reimage
  • 05:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1159.eqiad.wmnet with OS bookworm
  • 05:47 marostegui: Failover m3 from db1159 to db1119 - T352149
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 02:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2060.codfw.wmnet with OS bullseye
  • 02:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2059.codfw.wmnet with OS bullseye
  • 02:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 02:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 02:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2058.codfw.wmnet with OS bullseye
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 02:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2060.codfw.wmnet with OS bullseye
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2057.codfw.wmnet with OS bullseye
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2059.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2058.codfw.wmnet with OS bullseye
  • 01:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2057.codfw.wmnet with OS bullseye
  • 00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2034.codfw.wmnet with OS bullseye
  • 00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2033.codfw.wmnet with OS bullseye
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2034.codfw.wmnet with reason: host reimage
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2034.codfw.wmnet with reason: host reimage

2023-11-29

  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2033.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2091.codfw.wmnet with OS bookworm
  • 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2033.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2091.codfw.wmnet with reason: host reimage
  • 23:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2091.codfw.wmnet with reason: host reimage
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2033.codfw.wmnet with OS bullseye
  • 23:15 ladsgroup@deploy2002: Finished scap: Backport for Add virtual domain for botpasswords (T351559) (duration: 09m 28s)
  • 23:08 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 23:07 ladsgroup@deploy2002: ladsgroup: Backport for Add virtual domain for botpasswords (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:06 ladsgroup@deploy2002: Started scap: Backport for Add virtual domain for botpasswords (T351559)
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2090.codfw.wmnet with OS bookworm
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2091.codfw.wmnet with OS bookworm
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2090.codfw.wmnet with reason: host reimage
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2090.codfw.wmnet with reason: host reimage
  • 22:39 cstone: payments-wiki upgraded from 958cacac to 7feabffe
  • 22:31 eileen: civicrm upgraded from f7cdc727 to 83816165
  • 22:28 tgr@deploy2002: Finished scap: Backport for mobile: Remove $wgMobileUrlTemplate (duration: 20m 53s)
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2090.codfw.wmnet with OS bookworm
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2089.codfw.wmnet with OS bookworm
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:19 tgr@deploy2002: tgr: Continuing with sync
  • 22:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:10 inflatador: bking@cumin2002 running puppet against cp hosts to apply 978134
  • 22:08 tgr@deploy2002: tgr: Backport for mobile: Remove $wgMobileUrlTemplate synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:07 tgr@deploy2002: Started scap: Backport for mobile: Remove $wgMobileUrlTemplate
  • 22:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2089.codfw.wmnet with reason: host reimage
  • 22:01 tgr@deploy2002: Finished scap: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257) (duration: 10m 35s)
  • 21:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2089.codfw.wmnet with reason: host reimage
  • 21:55 tgr@deploy2002: dani and tgr and jdlrobson: Continuing with sync
  • 21:52 tgr@deploy2002: dani and tgr and jdlrobson: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 tgr@deploy2002: Started scap: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257)
  • 21:49 tgr@deploy2002: Backport cancelled.
  • 21:47 eileen: civicrm upgraded from 456b4805 to f7cdc727
  • 21:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2089.codfw.wmnet with OS bookworm
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2088.codfw.wmnet with OS bookworm
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:34 tgr@deploy2002: Finished scap: Backport for Deploy Annual Plan Core Metrics survey (T351353) (duration: 13m 47s)
  • 21:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:27 tgr@deploy2002: tgr and dani: Continuing with sync
  • 21:21 tgr@deploy2002: tgr and dani: Backport for Deploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:20 tgr@deploy2002: Started scap: Backport for Deploy Annual Plan Core Metrics survey (T351353)
  • 21:17 tgr@deploy2002: Finished scap: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074) (duration: 10m 18s)
  • 21:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 21:11 tgr@deploy2002: tgr and jdrewniak: Continuing with sync
  • 21:08 tgr@deploy2002: tgr and jdrewniak: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 tgr@deploy2002: Started scap: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074)
  • 21:03 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:01 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:00 sukhe: dummy authdns-update
  • 20:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2088.codfw.wmnet with OS bookworm
  • 20:53 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2087.codfw.wmnet with OS bookworm
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:51 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:50 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:48 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:47 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:45 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2003.codfw.wmnet with OS bullseye
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:25 sukhe: [correction] sudo cumin -b1 -s60 "A:dns-rec and not P{dns6001*}" "enable-puppet 'do not enable' && run-puppet-agent": T347054
  • 20:25 sukhe: sudo cumin -s1 -b60 "A:dns-rec and not P{dns6001*}" "enable-puppet 'do not enable' && run-puppet-agent": T347054
  • 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2087.codfw.wmnet with reason: host reimage
  • 20:22 sukhe: dns6001: running dummy authdns-update
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2087.codfw.wmnet with reason: host reimage
  • 20:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2003.codfw.wmnet with reason: host reimage
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2003.codfw.wmnet with reason: host reimage
  • 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2087.codfw.wmnet with OS bookworm
  • 19:39 sukhe: running authdns-update from dns6001
  • 19:26 sukhe: disable puppet on A:dns-rec to roll out CR 977101: T347054
  • 19:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:18 ejegg: payments-wiki upgraded from 44a41216 to 958cacac
  • 19:17 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:07 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:58 sukhe: re-enable Puppet on A:dns-rec
  • 18:58 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2003.codfw.wmnet with OS bullseye
  • 18:31 sukhe: disable puppet on A:dns-rec to roll out CR 976254: T347054
  • 17:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema2004.codfw.wmnet with OS bookworm
  • 17:47 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:47 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:43 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:43 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:41 sukhe: [finished] running dummy authdns-update, all 14 hosts affected
  • 17:40 sukhe: running dummy authdns-update
  • 17:35 sukhe: A:dns-rec: force run-puppet-agent
  • 17:35 sukhe: re-enable Puppet on A:dns-rec
  • 17:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema2004.codfw.wmnet with reason: host reimage
  • 17:26 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema2004.codfw.wmnet with reason: host reimage
  • 17:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logging-hd2003.codfw.wmnet with OS bullseye
  • 17:17 cdanis: depooling cp2029 for some manual testing
  • 17:14 sukhe: disable puppet on A:dns-rec to roll out CR 975843: T347054
  • 17:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema2004.codfw.wmnet with OS bookworm
  • 16:55 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:54 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:54 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:53 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:53 sukhe: sudo confctl --object-type dnsbox select 'dc=.*' set/pooled=yes T347054
  • 16:52 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:48 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:45 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:45 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:43 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:39 sukhe: confctl --object-type dnsbox select 'name=<host>' set/ip=<ip>
  • 16:37 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:36 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:32 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 16:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:21 ejegg: fundraising civicrm upgraded from efa3ea29 to 456b4805
  • 16:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4010.ulsfo.wmnet
  • 16:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema2003.codfw.wmnet with OS bookworm
  • 16:13 elukey: restart pyrra-filesystem on titan*
  • 16:13 elukey: reload all thanos-rule daemons on titan* to pick up new Pyrra Lift Wing rules
  • 16:13 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:12 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4009.ulsfo.wmnet
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 16:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:05 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:04 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:04 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema2003.codfw.wmnet with reason: host reimage
  • 15:59 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema2003.codfw.wmnet with reason: host reimage
  • 15:56 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4008.ulsfo.wmnet
  • 15:54 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:52 dancy@deploy2002: Installing scap version "4.64.0" for 570 hosts
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns2004.wikimedia.org
  • 15:50 dancy@deploy2002: Installing scap version "4.64.0" for 570 hosts
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2002.codfw.wmnet with OS bullseye
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2096']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1171.eqiad.wmnet with reason: host reimage
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1162.eqiad.wmnet with reason: host reimage
  • 15:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
  • 15:39 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1161.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1171.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1162.eqiad.wmnet with reason: host reimage
  • 15:37 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:36 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:36 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1161.eqiad.wmnet with reason: host reimage
  • 15:35 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:35 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:35 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:33 bblack: cp3066 - all back to normal ops
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 15:30 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema2003.codfw.wmnet with OS bookworm
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1162']
  • 15:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns2004.wikimedia.org
  • 15:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1161']
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns1004.wikimedia.org
  • 15:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 15:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1171']
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 15:22 bblack: cp3066 - depool temporarily, log pipe debugging, etc
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 15:21 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2001.codfw.wmnet with OS bullseye
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:21 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1162']
  • 15:20 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ printf '%s\n' https://en.wikipedia.org/static/images/mobile/copyright/{wikibooks,wikivoyage}-{tagline,wordmark}-he.svg | mwscript purgeList enwiki # T351913, T351981
  • 15:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1161']
  • 15:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913) (duration: 09m 10s)
  • 15:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns1004.wikimedia.org
  • 15:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema1004.eqiad.wmnet with OS bookworm
  • 15:13 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
  • 15:12 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:10 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913)
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2003.codfw.wmnet with OS bullseye
  • 15:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
  • 15:05 bblack: cp4052 - back to normal operations
  • 15:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2002.codfw.wmnet with reason: host reimage
  • 15:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 15:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema1004.eqiad.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2002.codfw.wmnet with reason: host reimage
  • 15:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
  • 14:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema1004.eqiad.wmnet with reason: host reimage
  • 14:55 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:55 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Configure wiki-highlights experiment stream (T348613) (duration: 42m 58s)
  • 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2001.codfw.wmnet with reason: host reimage
  • 14:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2001.codfw.wmnet with reason: host reimage
  • 14:43 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema1004.eqiad.wmnet with OS bookworm
  • 14:39 bblack: cp4052 - depool and disable puppet agent, more pipe debug
  • 14:38 lucaswerkmeister-wmde@deploy2002: sbisson and lucaswerkmeister-wmde: Continuing with sync
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema1003.eqiad.wmnet with OS bookworm
  • 14:36 lucaswerkmeister-wmde@deploy2002: sbisson and lucaswerkmeister-wmde: Backport for Configure wiki-highlights experiment stream (T348613) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2002.codfw.wmnet with OS bullseye
  • 14:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema1003.eqiad.wmnet with reason: host reimage
  • 14:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema1003.eqiad.wmnet with reason: host reimage
  • 14:17 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2001.codfw.wmnet with OS bullseye
  • 14:15 elukey: reload thanos-rule on titan[12]001 to pick up new pyrra rec rules
  • 14:12 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Configure wiki-highlights experiment stream (T348613)
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema1003.eqiad.wmnet with OS bookworm
  • 13:42 moritzm: installing tiff security updates
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.swift.audit-labels (exit_code=0) for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:33 jbond@cumin1001: START - Cookbook sre.swift.audit-labels for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:30 jbond@cumin1001: END (FAIL) - Cookbook sre.swift.audit-labels (exit_code=99) for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:30 jbond@cumin1001: START - Cookbook sre.swift.audit-labels for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
  • 13:05 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 13:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 13:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:01 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
  • 12:58 topranks: restoring DB snapshot from 11:37 UTC to netboxdb1002
  • 12:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:44 hashar@deploy2002: Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 07s)
  • 12:43 hashar@deploy2002: Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:36 hashar@deploy2002: Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 06s)
  • 12:35 hashar@deploy2002: Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:35 hashar@deploy2002: Finished deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 12s)
  • 12:35 hashar@deploy2002: Started deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:25 vgutierrez: rolling restart of pybal on lvs4008 and lvs4010, effectively enabling IPIP encapsulation for ncredir@ulsfo - T351069
  • 12:22 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 15s)
  • 12:22 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[1075-1090].eqiad.wmnet
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 12:05 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:04 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 12:02 hashar: Disabled Puppet agent on gerrit1003 and gerrit2002 to roll https://gerrit.wikimedia.org/r/844998 which requires some manual steps | T317412
  • 11:26 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:23 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 11:21 vgutierrez: upload tcp-mss-clamper 0.3+deb12u1 to apt.wm.o (bookworm) - T352249
  • 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:14 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:12 btullis: re-enabled all DAGs on all airflow instances after airflow upgrade to 2.7.3
  • 10:57 vgutierrez: upload ipip-multiqueue-optimizer 0.3+deb11u1 to apt.wm.o (bullseye) - T352249
  • 10:56 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:56 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:53 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:50 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:49 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:37 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp[1075-1090].eqiad.wmnet
  • 10:37 btullis: pausing all active dags on all airflow instances
  • 10:36 fabfur: decommissioning cp1075-1090 (T352253)
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53938 and previous config saved to /var/cache/conftool/dbconfig/20231129-092808-root.json
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:20 hashar@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.7 refs T350083 (duration: 07m 23s)
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.7 refs T350083
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53937 and previous config saved to /var/cache/conftool/dbconfig/20231129-091303-root.json
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:02 hashar@deploy2002: Finished scap: Backport for zghwiki: add logos (T350241) (duration: 09m 39s)
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53936 and previous config saved to /var/cache/conftool/dbconfig/20231129-085758-root.json
  • 08:55 hashar@deploy2002: hashar and anzx: Continuing with sync
  • 08:54 marostegui: Failover m1-master from dbproxy1022 to dbproxy1024 T351864
  • 08:53 hashar@deploy2002: hashar and anzx: Backport for zghwiki: add logos (T350241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:52 hashar@deploy2002: Started scap: Backport for zghwiki: add logos (T350241)
  • 08:48 hashar@deploy2002: Finished scap: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926) (duration: 10m 10s)
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53935 and previous config saved to /var/cache/conftool/dbconfig/20231129-084253-root.json
  • 08:41 hashar@deploy2002: hashar and anzx: Continuing with sync
  • 08:39 hashar@deploy2002: hashar and anzx: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:38 hashar@deploy2002: Started scap: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926)
  • 08:33 marostegui: Drop oathauth_users from centralauth T348693
  • 08:28 marostegui: Drop oathauth_users from fishbowl.dblist T348693
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53934 and previous config saved to /var/cache/conftool/dbconfig/20231129-082748-root.json
  • 08:22 marostegui: Drop oathauth_users from private.dblist T348693
  • 08:19 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2" (duration: 08m 01s)
  • 08:13 marostegui@deploy2002: marostegui: Continuing with sync
  • 08:12 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53933 and previous config saved to /var/cache/conftool/dbconfig/20231129-081243-root.json
  • 08:11 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2"
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bookworm
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53932 and previous config saved to /var/cache/conftool/dbconfig/20231129-075738-root.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1027.eqiad.wmnet with OS bookworm
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1027.eqiad.wmnet with reason: host reimage
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: host reimage
  • 07:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on es1027.eqiad.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bookworm
  • 07:25 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620) (duration: 09m 25s)
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1027.eqiad.wmnet with OS bookworm
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 T351916', diff saved to https://phabricator.wikimedia.org/P53931 and previous config saved to /var/cache/conftool/dbconfig/20231129-072306-root.json
  • 07:18 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:17 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:15 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620)
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 04:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 03:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 03:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 03:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 03:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 03:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1164.eqiad.wmnet with OS bullseye
  • 03:11 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:10 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1168.eqiad.wmnet with OS bullseye
  • 03:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 03:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 02:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 02:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 02:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:45 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:43 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:42 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1170.eqiad.wmnet with OS bullseye
  • 02:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1174.eqiad.wmnet with OS bullseye
  • 02:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:30 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1167.eqiad.wmnet with OS bullseye
  • 02:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1173.eqiad.wmnet with OS bullseye
  • 02:28 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1168.eqiad.wmnet with OS bullseye
  • 02:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1164.eqiad.wmnet with OS bullseye
  • 02:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1175.eqiad.wmnet with OS bullseye
  • 02:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1165.eqiad.wmnet with OS bullseye
  • 02:25 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:24 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1169.eqiad.wmnet with OS bullseye
  • 02:24 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1166.eqiad.wmnet with OS bullseye
  • 02:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1168.eqiad.wmnet with OS bullseye
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1163.eqiad.wmnet with OS bullseye
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:15 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1164.eqiad.wmnet with OS bullseye
  • 02:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1170.eqiad.wmnet with reason: host reimage
  • 02:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1174.eqiad.wmnet with reason: host reimage
  • 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1175.eqiad.wmnet with reason: host reimage
  • 02:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1167.eqiad.wmnet with reason: host reimage
  • 02:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1173.eqiad.wmnet with reason: host reimage
  • 02:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1165.eqiad.wmnet with reason: host reimage
  • 02:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 02:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1170.eqiad.wmnet with reason: host reimage
  • 02:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 01:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1175.eqiad.wmnet with reason: host reimage
  • 01:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1174.eqiad.wmnet with reason: host reimage
  • 01:58 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1173.eqiad.wmnet with reason: host reimage
  • 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1166.eqiad.wmnet with reason: host reimage
  • 01:57 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1163.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1167.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1166.eqiad.wmnet with reason: host reimage
  • 01:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 01:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1165.eqiad.wmnet with reason: host reimage
  • 01:52 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1163.eqiad.wmnet with reason: host reimage
  • 01:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1175.eqiad.wmnet with OS bullseye
  • 01:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1174.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1173.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 01:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1170.eqiad.wmnet with OS bullseye
  • 01:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
  • 01:42 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1167.eqiad.wmnet with OS bullseye
  • 01:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1166.eqiad.wmnet with OS bullseye
  • 01:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1164.eqiad.wmnet with OS bullseye
  • 01:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1165.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1163.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 01:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 00:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2028.codfw.wmnet with OS bullseye
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2035.codfw.wmnet with OS bullseye
  • 00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 00:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:25 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2035.codfw.wmnet with reason: host reimage
  • 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 00:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2034.codfw.wmnet with OS bullseye
  • 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2035.codfw.wmnet with reason: host reimage
  • 00:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1159.eqiad.wmnet with OS bullseye
  • 00:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1160.eqiad.wmnet with reason: host reimage
  • 00:02 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1160.eqiad.wmnet with reason: host reimage

2023-11-28

  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2034.codfw.wmnet with reason: host reimage
  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2035.codfw.wmnet with OS bullseye
  • 23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:51 bblack: cp4052 - all back to normal
  • 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2034.codfw.wmnet with reason: host reimage
  • 23:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1159.eqiad.wmnet with reason: host reimage
  • 23:42 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2032.codfw.wmnet with OS bullseye
  • 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1159.eqiad.wmnet with reason: host reimage
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2034.codfw.wmnet with OS bullseye
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2030.codfw.wmnet with OS bullseye
  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:25 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1159.eqiad.wmnet with OS bullseye
  • 23:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2032.codfw.wmnet with reason: host reimage
  • 23:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2032.codfw.wmnet with reason: host reimage
  • 23:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 23:15 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 23:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2030.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2030.codfw.wmnet with reason: host reimage
  • 23:07 bblack: cp4052 - repool
  • 23:05 bblack: cp4052 - depool temporarily
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2032.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2030.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2031.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:49 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2030.codfw.wmnet with OS bullseye
  • 22:33 bblack: cp4052 - disabling puppet to experiment on how we gather prometheus stats from ATS...
  • 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2031.codfw.wmnet with reason: host reimage
  • 22:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2031.codfw.wmnet with reason: host reimage
  • 22:23 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 22:20 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 22:19 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 22:12 urbanecm@deploy2002: Finished scap: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461) (duration: 13m 02s)
  • 22:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2031.codfw.wmnet with OS bullseye
  • 22:04 urbanecm@deploy2002: urbanecm and ssastry and jdlrobson: Continuing with sync
  • 22:01 urbanecm@deploy2002: urbanecm and ssastry and jdlrobson: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:59 urbanecm@deploy2002: Started scap: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461)
  • 21:58 urbanecm@deploy2002: Finished scap: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute (duration: 31m 09s)
  • 21:52 urbanecm@deploy2002: cscott and urbanecm and dani: Continuing with sync
  • 21:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2029.codfw.wmnet with OS bullseye
  • 21:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 21:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2029.codfw.wmnet with reason: host reimage
  • 21:29 urbanecm@deploy2002: cscott and urbanecm and dani: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 urbanecm@deploy2002: Started scap: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute
  • 21:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2029.codfw.wmnet with reason: host reimage
  • 21:18 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 21:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2029.codfw.wmnet with OS bullseye
  • 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1158.eqiad.wmnet with reason: host reimage
  • 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1158.eqiad.wmnet with reason: host reimage
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 ladsgroup@deploy2002: Finished scap: Backport for Disable VipsScaler in group0 (T290759) (duration: 10m 08s)
  • 20:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 20:30 ladsgroup@deploy2002: ladsgroup: Backport for Disable VipsScaler in group0 (T290759) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:29 ladsgroup@deploy2002: Started scap: Backport for Disable VipsScaler in group0 (T290759)
  • 20:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 20:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:16 milimetric@deploy2002: Finished deploy [analytics/refinery@72ec207] (thin): hotfix for webrequest refine (duration: 00m 07s)
  • 18:16 milimetric@deploy2002: Started deploy [analytics/refinery@72ec207] (thin): hotfix for webrequest refine
  • 18:15 milimetric@deploy2002: Finished deploy [analytics/refinery@72ec207]: hotfix for webrequest refine (duration: 08m 47s)
  • 18:06 milimetric@deploy2002: Started deploy [analytics/refinery@72ec207]: hotfix for webrequest refine
  • 17:49 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS bullseye
  • 17:04 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:03 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 16:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2030.codfw.wmnet with OS bullseye
  • 16:49 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2029.codfw.wmnet with OS bullseye
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 16:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:34 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2012.codfw.wmnet with OS bullseye
  • 16:23 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:23 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:17 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2012.codfw.wmnet with OS bullseye
  • 16:07 moritzm: installing distro-info-data updates
  • 16:04 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2011.codfw.wmnet
  • 16:01 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2011.codfw.wmnet
  • 15:54 moritzm: installing xen security updates
  • 15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS bullseye
  • 15:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2029.codfw.wmnet with OS bullseye
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2096']
  • 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2004.codfw.wmnet
  • 15:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2004.codfw.wmnet
  • 15:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2096']
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:25 moritzm: imported ganeti 3.0.2-1~deb11u1+wmf1 to apt.wikimedia.org/bullseye-wikimedia T350686
  • 15:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 15:15 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2011.codfw.wmnet with OS bullseye
  • 15:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53930 and previous config saved to /var/cache/conftool/dbconfig/20231128-150618-root.json
  • 15:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 14:58 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337) (duration: 10m 17s)
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53929 and previous config saved to /var/cache/conftool/dbconfig/20231128-145113-root.json
  • 14:46 lucaswerkmeister-wmde@deploy2002: sfaci and lucaswerkmeister-wmde: Continuing with sync
  • 14:43 lucaswerkmeister-wmde@deploy2002: sfaci and lucaswerkmeister-wmde: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337)
  • 14:39 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove mediawiki.web_ui.interactions event stream (T351195) (duration: 17m 36s)
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53928 and previous config saved to /var/cache/conftool/dbconfig/20231128-143608-root.json
  • 14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Continuing with sync
  • 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for Remove mediawiki.web_ui.interactions event stream (T351195) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove mediawiki.web_ui.interactions event stream (T351195)
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53927 and previous config saved to /var/cache/conftool/dbconfig/20231128-142102-root.json
  • 14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove EchoMail and EchoInteraction event streams (T344167) (duration: 10m 05s)
  • 14:13 lucaswerkmeister-wmde@deploy2002: phuedx and lucaswerkmeister-wmde: Continuing with sync
  • 14:10 volans: deploying python3-wmflib_1.2.4 fleet-wide (tested changes on all OSes)
  • 14:10 lucaswerkmeister-wmde@deploy2002: phuedx and lucaswerkmeister-wmde: Backport for Remove EchoMail and EchoInteraction event streams (T344167) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove EchoMail and EchoInteraction event streams (T344167)
  • 14:06 lucaswerkmeister-wmde@deploy2002: Backport cancelled.
  • 14:06 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53926 and previous config saved to /var/cache/conftool/dbconfig/20231128-140557-root.json
  • 14:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:05 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:04 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:03 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53925 and previous config saved to /var/cache/conftool/dbconfig/20231128-135052-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53924 and previous config saved to /var/cache/conftool/dbconfig/20231128-133547-root.json
  • 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS bookworm
  • 13:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 13:18 volans: uploaded python3-wmflib_1.2.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 12:58 XioNoX: re-enable sampling on cr1-esams:fpc1
  • 12:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS bookworm
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2028 T351916', diff saved to https://phabricator.wikimedia.org/P53923 and previous config saved to /var/cache/conftool/dbconfig/20231128-125235-root.json
  • 12:35 kart_: Updated Apertium to 2023-11-23-055425-production (ie Bookworm!) (T346997)
  • 12:32 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 12:32 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 12:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 12:26 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 12:13 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 12:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 12:02 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:02 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:58 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:57 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:56 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:55 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:50 vgutierrez: pool ncredir4001
  • 11:42 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:42 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:41 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:41 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:33 volans@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-esams:xe-0/1/2
  • 11:33 volans@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-esams:xe-0/1/2
  • 11:22 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:22 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:21 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:21 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:52 vgutierrez: depool ncredir4001
  • 10:45 vgutierrez: repool ncredir4001
  • 10:38 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:37 moritzm: installing lua5.3 security updates
  • 10:35 vgutierrez: depool ncredir4001
  • 10:21 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008, effectively disabling IPIP encapsulation on ncredir@ulsfo - T351069
  • 10:09 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008, effectively enabling IPIP encapsulation on ncredir@ulsfo - T351069
  • 10:01 sg912@deploy2002: Finished deploy [airflow-dags/analytics@0283c11]: (no justification provided) (duration: 00m 47s)
  • 10:00 sg912@deploy2002: Started deploy [airflow-dags/analytics@0283c11]: (no justification provided)
  • 09:58 moritzm: installing intel-microcode security updates
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:40 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.7 refs T350083
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1024.eqiad.wmnet with OS bookworm
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 08:47 hashar@deploy2002: Finished scap: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs" (duration: 07m 54s)
  • 08:41 hashar@deploy2002: hashar and ssastry: Continuing with sync
  • 08:41 hashar@deploy2002: hashar and ssastry: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:39 hashar@deploy2002: Started scap: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs"
  • 08:37 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1024.eqiad.wmnet with OS bookworm
  • 07:39 XioNoX: add RPKI ROA for 193.46.90.0/24 - T309297
  • 04:56 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.4 (duration: 02m 14s)
  • 04:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.7 refs T350083 (duration: 51m 11s)
  • 04:21 eileen: civicrm upgraded from f3de1778 to c2eaa50e
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.7 refs T350083
  • 01:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 01:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 01:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 01:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2094']
  • 01:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2034']
  • 01:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2101']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2097']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 01:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 01:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 01:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2034']
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti2034']
  • 01:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2034']
  • 01:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2101']
  • 01:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2101']
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2101']
  • 01:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2097']
  • 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2097']
  • 01:05 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2097']
  • 01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:52 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2029.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2028.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage

2023-11-27

  • 23:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on planet1002.eqiad.wmnet with reason: maintenance
  • 22:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on planet1002.eqiad.wmnet with reason: maintenance
  • 22:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 22:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin1001"
  • 22:13 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin1001"
  • 21:59 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1157.eqiad.wmnet with reason: host reimage
  • 21:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1157.eqiad.wmnet with reason: host reimage
  • 21:42 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 21:28 cjming: end of UTC late backport window
  • 21:26 cjming@deploy2002: Finished scap: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703) (duration: 07m 57s)
  • 21:19 cjming@deploy2002: isaranto and cjming: Continuing with sync
  • 21:19 cjming@deploy2002: isaranto and cjming: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:18 cjming@deploy2002: Started scap: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703)
  • 21:15 cjming@deploy2002: Finished scap: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685) (duration: 11m 26s)
  • 21:09 cjming@deploy2002: cjming and tgr: Continuing with sync
  • 21:07 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 21:05 cjming@deploy2002: cjming and tgr: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 cjming@deploy2002: Started scap: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685)
  • 21:02 btullis@deploy2002: Finished deploy [airflow-dags/analytics_test@0283c11]: (no justification provided) (duration: 00m 11s)
  • 21:02 btullis@deploy2002: Started deploy [airflow-dags/analytics_test@0283c11]: (no justification provided)
  • 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 20:16 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T351069
  • 19:53 vgutierrez: restarting pybal on lvs4008 (effectively enabling IPIP encapsulation on ncredir@ulsfo) - T351069
  • 19:50 vgutierrez: restarting pybal on lvs4010 - T351069
  • 19:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 19:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 18:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host planet1003.eqiad.wmnet with OS bookworm
  • 18:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 18:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 18:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 18:11 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 18:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 18:07 vgutierrez: restarting pybal on lvs4010 - T351069
  • 17:52 vgutierrez: upload ipip-multiqueue-optimizer 0.2 to apt.wm.o (bullseye) - T351069
  • 17:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2058']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2057']
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2033']
  • 17:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2058']
  • 17:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2057']
  • 17:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 16:56 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 31s)
  • 16:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::monitoring
  • 16:50 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 06s)
  • 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 16:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1157.eqiad.wmnet
  • 16:39 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1157.eqiad.wmnet
  • 16:31 vgutierrez: upload tcp-mss-clamper 0.2+deb12u1 to apt.wm.o (bookworm)
  • 16:30 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::monitoring
  • 16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2058.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2057.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2003']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2002']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2001']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2109']
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2058.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2057.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2003']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2002']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2001']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2109']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2060']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2059']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2003']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2002']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2001']
  • 16:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2109']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2060']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2059']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2003']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2002']
  • 16:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2001']
  • 16:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2109']
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2109.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 sukhe: enable puppet and start bird on dns4003
  • 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2059.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4003.wikimedia.org
  • 16:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2060.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4003.wikimedia.org
  • 16:07 sukhe: disable puppet and stop bird on dns4003: rebooting
  • 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: zookeeper::flink
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2060.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2059.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2109.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2108']
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2107']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2106']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2105']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2104']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2103']
  • 15:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:50 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: zookeeper::flink
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2108']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2107']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2106']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2105']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2104']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2103']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2108']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2107']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2106']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2105']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2104']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2103']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2108']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2107']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2106']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2105']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2104']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2103']
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2104.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2103.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2106.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2107.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2105.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2107.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2106.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2105.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2104.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2103.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2098']
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2099']
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2102']
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2102']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2099']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2098']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2102']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2099']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2098']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2102']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2099']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2098']
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2099.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2102.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2098.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 fabfur: set `pooled=yes` on cp11.* hosts in eqiad T349244
  • 15:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 15:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:09 hashar: restarting CI Jenkins
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2102.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 fabfur: `nfctl select name='cp10.*',service=ats-be set/pooled=inactive` (cdn and ats-be not used anymore on these hosts) T349244
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2099.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2098.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2095']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2093']
  • 14:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2092']
  • 14:57 urbanecm: mwmaint2002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue`
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2095']
  • 14:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2095']
  • 14:56 urbanecm: mwmaint2002: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2095']
  • 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns4003.wikimedia.org
  • 14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2090']
  • 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2091']
  • 14:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2089']
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1175
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2093']
  • 14:53 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1175
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1161
  • 14:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2093']
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1161
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2094.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1159
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1159
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1158
  • 14:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2093']
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1158
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1157
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1157
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1160
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1160
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2088']
  • 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2092']
  • 14:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2092']
  • 14:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2092']
  • 14:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 14:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2090']
  • 14:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2090']
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2090']
  • 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2092.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2089']
  • 14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2089']
  • 14:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2089']
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2087']
  • 14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2090.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns4003.wikimedia.org
  • 14:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088']
  • 14:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2088']
  • 14:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2089.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088']
  • 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2088.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:43 urbanecm@deploy2002: Finished scap: Backport for Enable native MathML rendering on dewiki (T350787) (duration: 09m 19s)
  • 14:43 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2087']
  • 14:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2087']
  • 14:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2087']
  • 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:38 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:38 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:38 urbanecm@deploy2002: urbanecm and physikerwelt: Continuing with sync
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2094.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2087.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2092.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:35 urbanecm@deploy2002: urbanecm and physikerwelt: Backport for Enable native MathML rendering on dewiki (T350787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 urbanecm@deploy2002: Started scap: Backport for Enable native MathML rendering on dewiki (T350787)
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2090.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 urbanecm@deploy2002: Finished scap: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229) (duration: 10m 57s)
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2089.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2088.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:30 moritzm: installing protobuf security updates
  • 14:27 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2087.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:24 urbanecm@deploy2002: urbanecm and anzx: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 urbanecm@deploy2002: Started scap: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229)
  • 14:21 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373) (duration: 11m 23s)
  • 14:15 urbanecm@deploy2002: sgimeno and anzx and urbanecm: Continuing with sync
  • 14:11 urbanecm@deploy2002: sgimeno and anzx and urbanecm: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373)
  • 14:10 urbanecm@deploy2002: Finished scap: Backport for UserImpact: Bump VERSION to 10 (T329700) (duration: 07m 56s)
  • 14:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1007.eqiad.wmnet with OS bullseye
  • 14:03 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:03 urbanecm@deploy2002: urbanecm: Backport for UserImpact: Bump VERSION to 10 (T329700) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:02 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:02 urbanecm@deploy2002: Started scap: Backport for UserImpact: Bump VERSION to 10 (T329700)
  • 13:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1007.eqiad.wmnet with reason: host reimage
  • 13:45 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:43 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1007.eqiad.wmnet with reason: host reimage
  • 13:38 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:38 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:37 godog: roll-restart prometheus/ops in eqiad/codfw to apply space-based retention - T351179
  • 13:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:31 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:30 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1007.eqiad.wmnet with OS bullseye
  • 13:20 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:04 urbanecm@deploy2002: Finished scap: Backport for Compress geui_data json blobs (T351898), User impact: timezone cleanup (T329700), UserImpact: Make smaller SQL queries (T351898) (duration: 07m 37s)
  • 12:56 urbanecm@deploy2002: Started scap: Backport for Compress geui_data json blobs (T351898), User impact: timezone cleanup (T329700), UserImpact: Make smaller SQL queries (T351898)
  • 12:34 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:34 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:18 kart_: Updated cxserver to 2023-11-24-152117-production (T351932)
  • 12:15 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:15 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:14 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:13 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:08 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:08 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:08 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:08 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:07 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:06 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:05 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 12:05 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 12:04 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:04 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:58 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:57 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:55 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:55 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:36 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:35 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:35 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:31 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:30 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:30 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:29 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:01 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:58 elukey: powercycle ml-serve2007 (OEM/DIMM error registered in getsel)
  • 10:50 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:50 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:47 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:47 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58485
  • 09:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58485
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45706
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45706
  • 08:43 taavi@deploy2002: Finished scap: Backport for Add virtual domain mapping for OATHAuth (T348484) (duration: 07m 53s)
  • 08:41 godog: restart prometheus/k8s-staging in eqiad - T343529
  • 08:37 taavi@deploy2002: taavi: Continuing with sync
  • 08:36 taavi@deploy2002: taavi: Backport for Add virtual domain mapping for OATHAuth (T348484) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:35 taavi@deploy2002: Started scap: Backport for Add virtual domain mapping for OATHAuth (T348484)
  • 08:29 taavi@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143) (duration: 19m 54s)
  • 08:23 taavi@deploy2002: taavi and sgimeno: Continuing with sync
  • 08:18 taavi@deploy2002: taavi and sgimeno: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:14 moritzm: installing dpkg bugfix updates on bullseye
  • 08:09 taavi@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143)
  • 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2134.codfw.wmnet with OS bookworm
  • 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2134.codfw.wmnet with reason: host reimage
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2134.codfw.wmnet with reason: host reimage
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2134.codfw.wmnet with OS bookworm
  • 06:40 marostegui: Failover m2 from db1119 to db1195 - T351863
  • 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switch
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switch
  • 06:16 kart_: Update cxserver to 2023-11-20-052250-production (T341458, T349118)
  • 06:12 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-11-24

  • 15:01 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:00 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53911 and previous config saved to /var/cache/conftool/dbconfig/20231124-133300-arnaudb.json
  • 13:24 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:24 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 13:24 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:23 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 13:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53910 and previous config saved to /var/cache/conftool/dbconfig/20231124-131755-arnaudb.json
  • 13:17 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53909 and previous config saved to /var/cache/conftool/dbconfig/20231124-130250-arnaudb.json
  • 12:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53908 and previous config saved to /var/cache/conftool/dbconfig/20231124-124745-arnaudb.json
  • 12:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53907 and previous config saved to /var/cache/conftool/dbconfig/20231124-123240-arnaudb.json
  • 12:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53906 and previous config saved to /var/cache/conftool/dbconfig/20231124-121735-arnaudb.json
  • 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53905 and previous config saved to /var/cache/conftool/dbconfig/20231124-120230-arnaudb.json
  • 11:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53904 and previous config saved to /var/cache/conftool/dbconfig/20231124-114725-arnaudb.json
  • 11:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53903 and previous config saved to /var/cache/conftool/dbconfig/20231124-113220-arnaudb.json
  • 11:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53902 and previous config saved to /var/cache/conftool/dbconfig/20231124-111715-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'set es2032 back as es1 master for T344589', diff saved to https://phabricator.wikimedia.org/P53901 and previous config saved to /var/cache/conftool/dbconfig/20231124-111109-arnaudb.json
  • 11:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53900 and previous config saved to /var/cache/conftool/dbconfig/20231124-110948-arnaudb.json
  • 11:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:55 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53899 and previous config saved to /var/cache/conftool/dbconfig/20231124-105443-arnaudb.json
  • 10:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:53 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53898 and previous config saved to /var/cache/conftool/dbconfig/20231124-104733-arnaudb.json
  • 10:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53897 and previous config saved to /var/cache/conftool/dbconfig/20231124-104635-arnaudb.json
  • 10:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53896 and previous config saved to /var/cache/conftool/dbconfig/20231124-104023-arnaudb.json
  • 10:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53895 and previous config saved to /var/cache/conftool/dbconfig/20231124-103938-arnaudb.json
  • 10:39 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:39 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53894 and previous config saved to /var/cache/conftool/dbconfig/20231124-103753-arnaudb.json
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53893 and previous config saved to /var/cache/conftool/dbconfig/20231124-103700-arnaudb.json
  • 10:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53892 and previous config saved to /var/cache/conftool/dbconfig/20231124-103228-arnaudb.json
  • 10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53891 and previous config saved to /var/cache/conftool/dbconfig/20231124-103130-arnaudb.json
  • 10:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53890 and previous config saved to /var/cache/conftool/dbconfig/20231124-102518-arnaudb.json
  • 10:24 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53889 and previous config saved to /var/cache/conftool/dbconfig/20231124-102433-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53888 and previous config saved to /var/cache/conftool/dbconfig/20231124-102248-arnaudb.json
  • 10:22 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2421.codfw.wmnet with OS bullseye
  • 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53887 and previous config saved to /var/cache/conftool/dbconfig/20231124-102155-arnaudb.json
  • 10:20 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2431.codfw.wmnet with OS bullseye
  • 10:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53886 and previous config saved to /var/cache/conftool/dbconfig/20231124-101722-arnaudb.json
  • 10:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1474.eqiad.wmnet with OS bullseye
  • 10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53885 and previous config saved to /var/cache/conftool/dbconfig/20231124-101625-arnaudb.json
  • 10:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1475.eqiad.wmnet with OS bullseye
  • 10:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2425.codfw.wmnet with OS bullseye
  • 10:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1472.eqiad.wmnet with OS bullseye
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1473.eqiad.wmnet with OS bullseye
  • 10:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53884 and previous config saved to /var/cache/conftool/dbconfig/20231124-101013-arnaudb.json
  • 10:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53883 and previous config saved to /var/cache/conftool/dbconfig/20231124-100928-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53882 and previous config saved to /var/cache/conftool/dbconfig/20231124-100743-arnaudb.json
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53881 and previous config saved to /var/cache/conftool/dbconfig/20231124-100650-arnaudb.json
  • 10:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53880 and previous config saved to /var/cache/conftool/dbconfig/20231124-100218-arnaudb.json
  • 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 10:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53879 and previous config saved to /var/cache/conftool/dbconfig/20231124-100120-arnaudb.json
  • 09:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1474.eqiad.wmnet with reason: host reimage
  • 09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 09:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53878 and previous config saved to /var/cache/conftool/dbconfig/20231124-095508-arnaudb.json
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 09:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53877 and previous config saved to /var/cache/conftool/dbconfig/20231124-095423-arnaudb.json
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 09:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1472.eqiad.wmnet with reason: host reimage
  • 09:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1473.eqiad.wmnet with reason: host reimage
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53876 and previous config saved to /var/cache/conftool/dbconfig/20231124-095238-arnaudb.json
  • 09:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53875 and previous config saved to /var/cache/conftool/dbconfig/20231124-095145-arnaudb.json
  • 09:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1474.eqiad.wmnet with reason: host reimage
  • 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1473.eqiad.wmnet with reason: host reimage
  • 09:50 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1472.eqiad.wmnet with reason: host reimage
  • 09:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53874 and previous config saved to /var/cache/conftool/dbconfig/20231124-094713-arnaudb.json
  • 09:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53873 and previous config saved to /var/cache/conftool/dbconfig/20231124-094614-arnaudb.json
  • 09:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53872 and previous config saved to /var/cache/conftool/dbconfig/20231124-094001-arnaudb.json
  • 09:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53871 and previous config saved to /var/cache/conftool/dbconfig/20231124-093918-arnaudb.json
  • 09:39 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1475.eqiad.wmnet with OS bullseye
  • 09:39 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1474.eqiad.wmnet with OS bullseye
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1473.eqiad.wmnet with OS bullseye
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1472.eqiad.wmnet with OS bullseye
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53870 and previous config saved to /var/cache/conftool/dbconfig/20231124-093733-arnaudb.json
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS bullseye
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2425.codfw.wmnet with OS bullseye
  • 09:36 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2421.codfw.wmnet with OS bullseye
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53869 and previous config saved to /var/cache/conftool/dbconfig/20231124-093640-arnaudb.json
  • 09:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53868 and previous config saved to /var/cache/conftool/dbconfig/20231124-093207-arnaudb.json
  • 09:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53867 and previous config saved to /var/cache/conftool/dbconfig/20231124-093109-arnaudb.json
  • 09:24 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53866 and previous config saved to /var/cache/conftool/dbconfig/20231124-092456-arnaudb.json
  • 09:24 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53865 and previous config saved to /var/cache/conftool/dbconfig/20231124-092413-arnaudb.json
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53864 and previous config saved to /var/cache/conftool/dbconfig/20231124-092228-arnaudb.json
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53862 and previous config saved to /var/cache/conftool/dbconfig/20231124-092135-arnaudb.json
  • 09:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53861 and previous config saved to /var/cache/conftool/dbconfig/20231124-091702-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53860 and previous config saved to /var/cache/conftool/dbconfig/20231124-091639-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53859 and previous config saved to /var/cache/conftool/dbconfig/20231124-091621-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53858 and previous config saved to /var/cache/conftool/dbconfig/20231124-091604-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53857 and previous config saved to /var/cache/conftool/dbconfig/20231124-091601-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53856 and previous config saved to /var/cache/conftool/dbconfig/20231124-090951-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 35%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53855 and previous config saved to /var/cache/conftool/dbconfig/20231124-090908-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53854 and previous config saved to /var/cache/conftool/dbconfig/20231124-090723-arnaudb.json
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53853 and previous config saved to /var/cache/conftool/dbconfig/20231124-090630-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53852 and previous config saved to /var/cache/conftool/dbconfig/20231124-090157-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53851 and previous config saved to /var/cache/conftool/dbconfig/20231124-090134-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53850 and previous config saved to /var/cache/conftool/dbconfig/20231124-090116-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53849 and previous config saved to /var/cache/conftool/dbconfig/20231124-090059-arnaudb.json
  • 09:00 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53848 and previous config saved to /var/cache/conftool/dbconfig/20231124-090056-arnaudb.json
  • 08:54 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53847 and previous config saved to /var/cache/conftool/dbconfig/20231124-085446-arnaudb.json
  • 08:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53846 and previous config saved to /var/cache/conftool/dbconfig/20231124-085403-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53845 and previous config saved to /var/cache/conftool/dbconfig/20231124-085218-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53844 and previous config saved to /var/cache/conftool/dbconfig/20231124-085125-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53843 and previous config saved to /var/cache/conftool/dbconfig/20231124-084652-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53842 and previous config saved to /var/cache/conftool/dbconfig/20231124-084629-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53841 and previous config saved to /var/cache/conftool/dbconfig/20231124-084611-arnaudb.json
  • 08:45 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53840 and previous config saved to /var/cache/conftool/dbconfig/20231124-084554-arnaudb.json
  • 08:45 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53839 and previous config saved to /var/cache/conftool/dbconfig/20231124-084551-arnaudb.json
  • 08:39 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53838 and previous config saved to /var/cache/conftool/dbconfig/20231124-083941-arnaudb.json
  • 08:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53837 and previous config saved to /var/cache/conftool/dbconfig/20231124-083858-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53836 and previous config saved to /var/cache/conftool/dbconfig/20231124-083713-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53835 and previous config saved to /var/cache/conftool/dbconfig/20231124-083620-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53834 and previous config saved to /var/cache/conftool/dbconfig/20231124-083147-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53833 and previous config saved to /var/cache/conftool/dbconfig/20231124-083124-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53832 and previous config saved to /var/cache/conftool/dbconfig/20231124-083106-arnaudb.json
  • 08:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53831 and previous config saved to /var/cache/conftool/dbconfig/20231124-083049-arnaudb.json
  • 08:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53830 and previous config saved to /var/cache/conftool/dbconfig/20231124-083046-arnaudb.json
  • 08:24 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53829 and previous config saved to /var/cache/conftool/dbconfig/20231124-082436-arnaudb.json
  • 08:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53828 and previous config saved to /var/cache/conftool/dbconfig/20231124-082353-arnaudb.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53827 and previous config saved to /var/cache/conftool/dbconfig/20231124-082208-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53826 and previous config saved to /var/cache/conftool/dbconfig/20231124-082115-arnaudb.json
  • 08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53825 and previous config saved to /var/cache/conftool/dbconfig/20231124-081619-arnaudb.json
  • 08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53824 and previous config saved to /var/cache/conftool/dbconfig/20231124-081601-arnaudb.json
  • 08:15 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53823 and previous config saved to /var/cache/conftool/dbconfig/20231124-081541-arnaudb.json
  • 08:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 and db2193 to fix their config, will repool them asap', diff saved to https://phabricator.wikimedia.org/P53822 and previous config saved to /var/cache/conftool/dbconfig/20231124-081422-arnaudb.json
  • 08:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2181 and db2193 to fix their config, will repool them asap', diff saved to https://phabricator.wikimedia.org/P53821 and previous config saved to /var/cache/conftool/dbconfig/20231124-081304-arnaudb.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2001.codfw.wmnet with OS bookworm
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53820 and previous config saved to /var/cache/conftool/dbconfig/20231124-080848-arnaudb.json
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53819 and previous config saved to /var/cache/conftool/dbconfig/20231124-075343-arnaudb.json
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'repool API on db2181', diff saved to https://phabricator.wikimedia.org/P53818 and previous config saved to /var/cache/conftool/dbconfig/20231124-075137-arnaudb.json
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2001.codfw.wmnet with reason: host reimage
  • 07:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2001.codfw.wmnet with reason: host reimage
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53817 and previous config saved to /var/cache/conftool/dbconfig/20231124-073838-arnaudb.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53816 and previous config saved to /var/cache/conftool/dbconfig/20231124-073743-root.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2002.codfw.wmnet with OS bookworm
  • 07:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:35 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53815 and previous config saved to /var/cache/conftool/dbconfig/20231124-073510-root.json
  • 07:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2001.codfw.wmnet with OS bookworm
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53814 and previous config saved to /var/cache/conftool/dbconfig/20231124-072238-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53813 and previous config saved to /var/cache/conftool/dbconfig/20231124-072005-root.json
  • 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2002.codfw.wmnet with reason: host reimage
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2002.codfw.wmnet with reason: host reimage
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53812 and previous config saved to /var/cache/conftool/dbconfig/20231124-070733-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53811 and previous config saved to /var/cache/conftool/dbconfig/20231124-070500-root.json
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2002.codfw.wmnet with OS bookworm
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2003.codfw.wmnet with OS bookworm
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53810 and previous config saved to /var/cache/conftool/dbconfig/20231124-065228-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53809 and previous config saved to /var/cache/conftool/dbconfig/20231124-064955-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53808 and previous config saved to /var/cache/conftool/dbconfig/20231124-063723-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53807 and previous config saved to /var/cache/conftool/dbconfig/20231124-063450-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P53806 and previous config saved to /var/cache/conftool/dbconfig/20231124-063424-root.json
  • 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2003.codfw.wmnet with reason: host reimage
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122', diff saved to https://phabricator.wikimedia.org/P53805 and previous config saved to /var/cache/conftool/dbconfig/20231124-063152-root.json
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2003.codfw.wmnet with reason: host reimage
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bookworm

2023-11-23

  • 17:44 vgutierrez: repool ncredir4001
  • 17:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2420.codfw.wmnet with OS bullseye
  • 17:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2131.codfw.wmnet onto db2191.codfw.wmnet
  • 17:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 17:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 17:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 16:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS bullseye
  • 16:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53804 and previous config saved to /var/cache/conftool/dbconfig/20231123-163507-arnaudb.json
  • 16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53802 and previous config saved to /var/cache/conftool/dbconfig/20231123-162002-arnaudb.json
  • 16:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 16:13 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53801 and previous config saved to /var/cache/conftool/dbconfig/20231123-160457-arnaudb.json
  • 15:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53799 and previous config saved to /var/cache/conftool/dbconfig/20231123-154952-arnaudb.json
  • 15:44 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53798 and previous config saved to /var/cache/conftool/dbconfig/20231123-154425-arnaudb.json
  • 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53797 and previous config saved to /var/cache/conftool/dbconfig/20231123-153447-arnaudb.json
  • 15:29 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53796 and previous config saved to /var/cache/conftool/dbconfig/20231123-152920-arnaudb.json
  • 15:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53795 and previous config saved to /var/cache/conftool/dbconfig/20231123-151942-arnaudb.json
  • 15:14 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53794 and previous config saved to /var/cache/conftool/dbconfig/20231123-151415-arnaudb.json
  • 15:11 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1008.eqiad.wmnet with OS bullseye
  • 15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53793 and previous config saved to /var/cache/conftool/dbconfig/20231123-151122-arnaudb.json
  • 15:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53792 and previous config saved to /var/cache/conftool/dbconfig/20231123-150825-arnaudb.json
  • 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53790 and previous config saved to /var/cache/conftool/dbconfig/20231123-150437-arnaudb.json
  • 14:59 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53789 and previous config saved to /var/cache/conftool/dbconfig/20231123-145910-arnaudb.json
  • 14:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53788 and previous config saved to /var/cache/conftool/dbconfig/20231123-145617-arnaudb.json
  • 14:55 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1008.eqiad.wmnet with reason: host reimage
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53787 and previous config saved to /var/cache/conftool/dbconfig/20231123-145320-arnaudb.json
  • 14:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1008.eqiad.wmnet with reason: host reimage
  • 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53786 and previous config saved to /var/cache/conftool/dbconfig/20231123-144932-arnaudb.json
  • 14:44 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53785 and previous config saved to /var/cache/conftool/dbconfig/20231123-144405-arnaudb.json
  • 14:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53784 and previous config saved to /var/cache/conftool/dbconfig/20231123-144112-arnaudb.json
  • 14:38 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53783 and previous config saved to /var/cache/conftool/dbconfig/20231123-143815-arnaudb.json
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53782 and previous config saved to /var/cache/conftool/dbconfig/20231123-143427-arnaudb.json
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'temporary depool of db1242 to fix API', diff saved to https://phabricator.wikimedia.org/P53781 and previous config saved to /var/cache/conftool/dbconfig/20231123-143238-arnaudb.json
  • 14:31 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:31 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:30 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:30 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2175 repooling api', diff saved to https://phabricator.wikimedia.org/P53780 and previous config saved to /var/cache/conftool/dbconfig/20231123-142950-arnaudb.json
  • 14:29 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53779 and previous config saved to /var/cache/conftool/dbconfig/20231123-142900-arnaudb.json
  • 14:27 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 14:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53778 and previous config saved to /var/cache/conftool/dbconfig/20231123-142639-arnaudb.json
  • 14:26 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53777 and previous config saved to /var/cache/conftool/dbconfig/20231123-142607-arnaudb.json
  • 14:25 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53776 and previous config saved to /var/cache/conftool/dbconfig/20231123-142310-arnaudb.json
  • 14:22 fabfur: swap cp1115 <-> cp1090 (T349244)
  • 14:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1115.eqiad.wmnet
  • 14:21 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1115.eqiad.wmnet
  • 14:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:21 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:20 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:17 fabfur: swap cp1114 <-> cp1089 (T349244)
  • 14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1114.eqiad.wmnet
  • 14:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1114.eqiad.wmnet
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2004.codfw.wmnet with reason: host reimage
  • 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53775 and previous config saved to /var/cache/conftool/dbconfig/20231123-141355-arnaudb.json
  • 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2004.codfw.wmnet with reason: host reimage
  • 14:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53774 and previous config saved to /var/cache/conftool/dbconfig/20231123-141134-arnaudb.json
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53773 and previous config saved to /var/cache/conftool/dbconfig/20231123-141102-arnaudb.json
  • 14:09 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host irc1002.wikimedia.org
  • 14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53772 and previous config saved to /var/cache/conftool/dbconfig/20231123-140805-arnaudb.json
  • 13:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53771 and previous config saved to /var/cache/conftool/dbconfig/20231123-135850-arnaudb.json
  • 13:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host irc1002.wikimedia.org
  • 13:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53770 and previous config saved to /var/cache/conftool/dbconfig/20231123-135629-arnaudb.json
  • 13:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53769 and previous config saved to /var/cache/conftool/dbconfig/20231123-135557-arnaudb.json
  • 13:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:53 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53768 and previous config saved to /var/cache/conftool/dbconfig/20231123-135300-arnaudb.json
  • 13:49 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host irc2002.wikimedia.org
  • 13:45 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2131.codfw.wmnet onto db2191.codfw.wmnet
  • 13:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53767 and previous config saved to /var/cache/conftool/dbconfig/20231123-134345-arnaudb.json
  • 13:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2131 in db2191 for T343674', diff saved to https://phabricator.wikimedia.org/P53766 and previous config saved to /var/cache/conftool/dbconfig/20231123-134316-arnaudb.json
  • 13:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53765 and previous config saved to /var/cache/conftool/dbconfig/20231123-134124-arnaudb.json
  • 13:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53764 and previous config saved to /var/cache/conftool/dbconfig/20231123-134052-arnaudb.json
  • 13:39 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53763 and previous config saved to /var/cache/conftool/dbconfig/20231123-133755-arnaudb.json
  • 13:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host irc2002.wikimedia.org
  • 13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53762 and previous config saved to /var/cache/conftool/dbconfig/20231123-132840-arnaudb.json
  • 13:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53761 and previous config saved to /var/cache/conftool/dbconfig/20231123-132619-arnaudb.json
  • 13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53760 and previous config saved to /var/cache/conftool/dbconfig/20231123-132547-arnaudb.json
  • 13:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53759 and previous config saved to /var/cache/conftool/dbconfig/20231123-132250-arnaudb.json
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53758 and previous config saved to /var/cache/conftool/dbconfig/20231123-132215-arnaudb.json
  • 13:21 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53757 and previous config saved to /var/cache/conftool/dbconfig/20231123-131114-arnaudb.json
  • 13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53756 and previous config saved to /var/cache/conftool/dbconfig/20231123-131042-arnaudb.json
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53755 and previous config saved to /var/cache/conftool/dbconfig/20231123-130745-arnaudb.json
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53754 and previous config saved to /var/cache/conftool/dbconfig/20231123-130710-arnaudb.json
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53753 and previous config saved to /var/cache/conftool/dbconfig/20231123-125609-arnaudb.json
  • 12:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53752 and previous config saved to /var/cache/conftool/dbconfig/20231123-125537-arnaudb.json
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53751 and previous config saved to /var/cache/conftool/dbconfig/20231123-125240-arnaudb.json
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53750 and previous config saved to /var/cache/conftool/dbconfig/20231123-125205-arnaudb.json
  • 12:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2181.codfw.wmnet onto db2195.codfw.wmnet
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53749 and previous config saved to /var/cache/conftool/dbconfig/20231123-124104-arnaudb.json
  • 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 12:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53748 and previous config saved to /var/cache/conftool/dbconfig/20231123-123700-arnaudb.json
  • 12:30 vgutierrez: depooling ncredir4001 till puppet is fixed
  • 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53747 and previous config saved to /var/cache/conftool/dbconfig/20231123-122559-arnaudb.json
  • 12:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53746 and previous config saved to /var/cache/conftool/dbconfig/20231123-122155-arnaudb.json
  • 12:19 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 12:11 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53745 and previous config saved to /var/cache/conftool/dbconfig/20231123-121054-arnaudb.json
  • 12:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2146.codfw.wmnet onto db2188.codfw.wmnet
  • 12:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53744 and previous config saved to /var/cache/conftool/dbconfig/20231123-120650-arnaudb.json
  • 11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53743 and previous config saved to /var/cache/conftool/dbconfig/20231123-115145-arnaudb.json
  • 11:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53742 and previous config saved to /var/cache/conftool/dbconfig/20231123-113640-arnaudb.json
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lists1004.eqiad.wmnet
  • 11:25 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 11:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2175.codfw.wmnet onto db2189.codfw.wmnet
  • 11:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lists1004.eqiad.wmnet
  • 11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53741 and previous config saved to /var/cache/conftool/dbconfig/20231123-112135-arnaudb.json
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53740 and previous config saved to /var/cache/conftool/dbconfig/20231123-110630-arnaudb.json
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: orchestrator
  • 10:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 10:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: orchestrator
  • 10:50 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2181.codfw.wmnet onto db2195.codfw.wmnet
  • 10:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2181 in db2195 for T343674', diff saved to https://phabricator.wikimedia.org/P53739 and previous config saved to /var/cache/conftool/dbconfig/20231123-104724-arnaudb.json
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:34 stevemunene@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host druid1008.eqiad.wmnet with OS bullseye
  • 10:31 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2146.codfw.wmnet onto db2188.codfw.wmnet
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: swift::proxy
  • 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2146 in db2188 for T343674', diff saved to https://phabricator.wikimedia.org/P53738 and previous config saved to /var/cache/conftool/dbconfig/20231123-102840-arnaudb.json
  • 10:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:22 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 10:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: swift::proxy
  • 10:09 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2175.codfw.wmnet onto db2189.codfw.wmnet
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2175 in db2189 for T343674', diff saved to https://phabricator.wikimedia.org/P53737 and previous config saved to /var/cache/conftool/dbconfig/20231123-100638-arnaudb.json
  • 10:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 09:59 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:19 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:18 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:18 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:17 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2149 in db2190 for T343674', diff saved to https://phabricator.wikimedia.org/P53736 and previous config saved to /var/cache/conftool/dbconfig/20231123-091514-arnaudb.json
  • 09:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:12 godog: add 50G to prometheus/services in codfw
  • 09:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 09:10 godog: add 80G to prometheus/k8s in eqiad
  • 08:49 Emperor: powercycle titan1001
  • 08:45 moritzm: powercycling titan1002
  • 08:37 hashar: Restarting CI Jenkins for plugins removals
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bookworm
  • 07:19 _joe_: restarted sirenbot
  • 07:08 hashar: Restarted CI Jenkins to upgrade Rebuilder plugin
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 06:53 hashar: Restarting Gerrit
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bookworm
  • 06:50 hashar: Restarting CI Jenkins for plugins removals
  • 06:44 marostegui: Failover m2 from db1195 to db1119 - T351638
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switch
  • 06:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switch
  • 06:23 hashar: Restarting CI Jenkins for plugin update # T282893
  • 02:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2028.codfw.wmnet with OS bullseye
  • 01:26 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet2003.codfw.wmnet with OS bookworm
  • 01:26 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 01:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 01:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2033']
  • 01:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2033']
  • 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 00:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 00:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet2003.codfw.wmnet with OS bookworm
  • 00:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host planet2003.codfw.wmnet
  • 00:28 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet2003.codfw.wmnet with OS bookworm
  • 00:20 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 00:19 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 00:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED

2023-11-22

  • 23:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2035']
  • 23:58 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2035']
  • 23:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2034']
  • 23:58 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2034']
  • 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2033']
  • 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2033']
  • 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2032']
  • 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2032']
  • 23:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2031']
  • 23:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2031']
  • 23:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2030']
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2030']
  • 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2029']
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2029']
  • 23:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2029']
  • 23:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2029']
  • 23:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2028']
  • 23:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2028']
  • 23:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS bullseye
  • 23:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 23:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 23:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 22:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ganeti servers in codfw - jhancock@cumin2002"
  • 22:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ganeti servers in codfw - jhancock@cumin2002"
  • 22:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:43 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2010.codfw.wmnet with OS bullseye
  • 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet2003.codfw.wmnet with OS bookworm
  • 22:41 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:40 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) planet2003.codfw.wmnet on all recursors
  • 22:40 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache planet2003.codfw.wmnet on all recursors
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:37 jhathaway: my latest commit, may have broken puppet-merge, I'm investigating
  • 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet2003.codfw.wmnet
  • 22:34 mutante: puppetserver1001 - manually signed puppet cert request for planet1003
  • 22:25 ebernhardson: start cirrus updater backfilling into relforge
  • 22:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 22:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 22:20 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:20 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2010.codfw.wmnet with OS bullseye
  • 22:18 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host planet1003.eqiad.wmnet
  • 22:08 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 22:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1175.eqiad.wmnet with OS bullseye
  • 22:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS bullseye
  • 21:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 21:44 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 21:37 catrope@deploy2002: Finished scap: Backport for Update Annual Plan Core Metrics survey (T351353) (duration: 09m 04s)
  • 21:30 catrope@deploy2002: catrope and dani: Continuing with sync
  • 21:30 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS bullseye
  • 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2008.codfw.wmnet
  • 21:29 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2008.codfw.wmnet
  • 21:29 catrope@deploy2002: catrope and dani: Backport for Update Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS bullseye
  • 21:28 catrope@deploy2002: Started scap: Backport for Update Annual Plan Core Metrics survey (T351353)
  • 21:24 catrope@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393) (duration: 21m 43s)
  • 21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 21:18 catrope@deploy2002: catrope and dani: Continuing with sync
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 21:11 catrope@deploy2002: catrope and dani: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 21:04 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 21:03 catrope@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393)
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:51 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) planet1003.eqiad.wmnet on all recursors
  • 20:50 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache planet1003.eqiad.wmnet on all recursors
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:50 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
  • 20:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1175.eqiad.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2007.codfw.wmnet
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2007.codfw.wmnet
  • 20:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS bullseye
  • 20:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53735 and previous config saved to /var/cache/conftool/dbconfig/20231122-202947-arnaudb.json
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53734 and previous config saved to /var/cache/conftool/dbconfig/20231122-201441-arnaudb.json
  • 20:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 20:11 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53733 and previous config saved to /var/cache/conftool/dbconfig/20231122-195934-arnaudb.json
  • 19:58 ejegg: standalone (payments listener) SmashPig upgraded from b867e553 to f24afba3
  • 19:56 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS bullseye
  • 19:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2006.codfw.wmnet
  • 19:55 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2006.codfw.wmnet
  • 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS bullseye
  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53732 and previous config saved to /var/cache/conftool/dbconfig/20231122-194428-arnaudb.json
  • 19:37 ejegg: fundraising civicrm upgraded from 3c5db93b to f3de1778
  • 19:36 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 19:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS bullseye
  • 19:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2005.codfw.wmnet
  • 19:05 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2005.codfw.wmnet
  • 18:58 ejegg: standalone SmashPig upgraded from c5b12dc3 to b867e553
  • 18:57 cstone: payments-wiki upgraded from 714552c5 to f02f8653
  • 18:50 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 18:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS bullseye
  • 17:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 17:42 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 17:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1175']
  • 17:29 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1175']
  • 17:28 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:28 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1175.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS bullseye
  • 17:27 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:27 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2004.codfw.wmnet
  • 17:26 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2004.codfw.wmnet
  • 17:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:24 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS bullseye
  • 17:23 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 17:21 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:21 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:07 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 17:01 fabfur: swapped cp1113 <-> cp1088 (T349244)
  • 16:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1113.eqiad.wmnet
  • 16:57 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1113.eqiad.wmnet
  • 16:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 16:56 Emperor: repool ms-fe1014 with new envoy TLS setup T317616
  • 16:55 volans: installed spicerack v8.2.0 to the cumin hosts
  • 16:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 16:48 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 16:47 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 16:46 Emperor: repool moss-fe2001 with new envoy TLS setup T317616
  • 16:45 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 16:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS bullseye
  • 16:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2003.codfw.wmnet
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2003.codfw.wmnet
  • 16:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS bullseye
  • 16:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 16:35 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 16:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:31 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mediabackup::worker
  • 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:25 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:24 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mediabackup::worker
  • 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mediabackup::storage
  • 16:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
  • 16:18 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 16:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 Emperor: depool ms-fe1014 to reimage with new envoy TLS setup T317616
  • 16:15 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
  • 16:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:15 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:15 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 16:14 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 Emperor: repool moss-fe1001 with new envoy TLS setup T317616
  • 16:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mediabackup::storage
  • 16:13 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:13 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:12 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::metadata
  • 16:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:09 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 sukhe: enable Puppet on A:lvs to merge CR 976312 and run agent
  • 16:08 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:05 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::metadata
  • 16:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::content
  • 16:05 sukhe: disable Puppet on A:lvs to merge CR 976312
  • 16:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:02 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:02 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
  • 16:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2003.codfw.wmnet with OS bullseye
  • 15:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::content
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::production
  • 15:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:55 moritzm: installing dpkg bugfix updates on bullseye
  • 15:55 fabfur@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 15:54 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:53 fabfur@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 15:53 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding cp1113 back with correct VLAN - fabfur@cumin1001"
  • 15:52 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding cp1113 back with correct VLAN - fabfur@cumin1001"
  • 15:48 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host testreduce1002.eqiad.wmnet
  • 15:46 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:46 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:45 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
  • 15:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2002.codfw.wmnet
  • 15:45 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::production
  • 15:45 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2002.codfw.wmnet
  • 15:44 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::es
  • 15:43 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:42 volans: uploaded spicerack_8.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:40 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1175.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:36 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::es
  • 15:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS bullseye
  • 15:35 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::databases
  • 15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host testreduce1002.eqiad.wmnet
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1175 - jclark@cumin1001"
  • 15:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1175 - jclark@cumin1001"
  • 15:30 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::databases
  • 15:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:28 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:28 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:28 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:17 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:15 moritzm: installing python3.7 security updates
  • 15:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
  • 15:11 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host wdqs2008.codfw.wmnet
  • 15:07 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp1113.eqiad.wmnet
  • 15:06 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:06 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:04 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:03 Emperor: depool moss-fe1001 to reimage with new envoy TLS setup T317616
  • 15:02 Emperor: depool moss-fe2001 to reimage with new envoy TLS setup T317616
  • 15:01 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 jayme: uncordoned and repooled kubernetes1013
  • 15:00 Emperor: repool ms-fe1009 with new envoy TLS setup T317616
  • 14:59 Emperor: repool ms-fe2009 with new envoy TLS setup T317616
  • 14:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host wdqs2008.codfw.wmnet
  • 14:57 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS bullseye
  • 14:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
  • 14:56 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS bullseye
  • 14:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 14:50 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp1113.eqiad.wmnet
  • 14:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
  • 14:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 14:35 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 14:35 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 14:34 fabfur: start re-provisioning and re-imaging cp1113 to fix wrong subnet (T342159)
  • 14:34 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
  • 14:32 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 14:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:30 urandom: restarting Cassandra, sessionstore2001 (post-Puppet 7 migration)
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53726 and previous config saved to /var/cache/conftool/dbconfig/20231122-142312-arnaudb.json
  • 14:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 14:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53725 and previous config saved to /var/cache/conftool/dbconfig/20231122-142301-arnaudb.json
  • 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
  • 14:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS bullseye
  • 14:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2001.codfw.wmnet
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2001.codfw.wmnet
  • 14:19 Emperor: depool ms-fe2009 to reimage with new envoy TLS setup T317616
  • 14:19 Emperor: depool ms-fe1009 to reimage with new envoy TLS setup T317616
  • 14:14 Emperor: repool ms-fe2010 with new envoy TLS setup T317616
  • 14:14 Emperor: repool ms-fe1010 with new envoy TLS setup T317616
  • 14:12 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host sessionstore2001.codfw.wmnet
  • 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53724 and previous config saved to /var/cache/conftool/dbconfig/20231122-140754-arnaudb.json
  • 14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS bullseye
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2001.codfw.wmnet
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1010.eqiad.wmnet with OS bullseye
  • 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53723 and previous config saved to /var/cache/conftool/dbconfig/20231122-135248-arnaudb.json
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host logstash2001.codfw.wmnet
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2023.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1475.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1474.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1472.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1473.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2425.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2431.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2421.codfw.wmnet
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2010.codfw.wmnet with reason: host reimage
  • 13:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1010.eqiad.wmnet with reason: host reimage
  • 13:43 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2010.codfw.wmnet with reason: host reimage
  • 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host logstash2023.codfw.wmnet
  • 13:41 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1010.eqiad.wmnet with reason: host reimage
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53722 and previous config saved to /var/cache/conftool/dbconfig/20231122-133741-arnaudb.json
  • 13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS bullseye
  • 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1010.eqiad.wmnet with OS bullseye
  • 13:27 Emperor: depool ms-fe2010 to reimage with new envoy TLS setup T317616
  • 13:27 Emperor: depool ms-fe1010 to reimage with new envoy TLS setup T317616
  • 13:23 Emperor: repool ms-fe2011 with new envoy TLS setup T317616
  • 13:22 Emperor: repool ms-fe1011 with new envoy TLS setup T317616
  • 13:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS bullseye
  • 13:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1011.eqiad.wmnet with OS bullseye
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host logstash2023.codfw.wmnet
  • 13:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2011.codfw.wmnet with reason: host reimage
  • 13:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1011.eqiad.wmnet with reason: host reimage
  • 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2011.codfw.wmnet with reason: host reimage
  • 13:02 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:02 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host logstash2023.codfw.wmnet
  • 13:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host logstash2001.codfw.wmnet
  • 13:00 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:59 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1011.eqiad.wmnet with reason: host reimage
  • 12:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:59 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 12:59 claime: Raising mw-web and mw-api-ext replicas for traffic bump - T348122
  • 12:58 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2420.codfw.wmnet
  • 12:55 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1474.eqiad.wmnet
  • 12:55 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1475.eqiad.wmnet
  • 12:54 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1473.eqiad.wmnet
  • 12:54 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1472.eqiad.wmnet
  • 12:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1001.eqiad.wmnet with OS bullseye
  • 12:52 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2431.codfw.wmnet
  • 12:52 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2425.codfw.wmnet
  • 12:51 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2421.codfw.wmnet
  • 12:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host logstash2001.codfw.wmnet
  • 12:49 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2420.codfw.wmnet
  • 12:48 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS bullseye
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1011.eqiad.wmnet with OS bullseye
  • 12:45 Emperor: depool ms-fe1011 to reimage with new envoy TLS setup T317616
  • 12:45 Emperor: depool ms-fe2011 to reimage with new envoy TLS setup T317616
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 12:35 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1001.eqiad.wmnet with reason: host reimage
  • 12:32 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1001.eqiad.wmnet with reason: host reimage
  • 12:31 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host mc-gp1001.eqiad.wmnet
  • 12:27 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb1002.eqiad.wmnet
  • 12:19 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1001.eqiad.wmnet with OS bullseye
  • 12:15 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 12:15 Emperor: repool ms-fe1012 with new envoy TLS setup T317616
  • 12:15 Emperor: repool ms-fe2012 with new envoy TLS setup T317616
  • 12:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS bullseye
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 11:56 hashar: Restarting Gerrit
  • 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2012.codfw.wmnet with reason: host reimage
  • 11:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2012.codfw.wmnet with reason: host reimage
  • 11:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 11:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::postgresql
  • 11:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS bullseye
  • 11:35 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 11:34 Emperor: depool ms-fe2012 to reimage with new envoy TLS setup T317616
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::postgresql
  • 11:33 Emperor: depool ms-fe1012 to reimage with new envoy TLS setup T317616
  • 11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53719 and previous config saved to /var/cache/conftool/dbconfig/20231122-112649-arnaudb.json
  • 11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53718 and previous config saved to /var/cache/conftool/dbconfig/20231122-112641-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53717 and previous config saved to /var/cache/conftool/dbconfig/20231122-111144-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53716 and previous config saved to /var/cache/conftool/dbconfig/20231122-111136-arnaudb.json
  • 10:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53715 and previous config saved to /var/cache/conftool/dbconfig/20231122-105639-arnaudb.json
  • 10:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53714 and previous config saved to /var/cache/conftool/dbconfig/20231122-105631-arnaudb.json
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-db1002.eqiad.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-db1002.eqiad.wmnet
  • 10:43 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53713 and previous config saved to /var/cache/conftool/dbconfig/20231122-104134-arnaudb.json
  • 10:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53712 and previous config saved to /var/cache/conftool/dbconfig/20231122-104126-arnaudb.json
  • 10:33 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:33 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:33 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: swift::storage
  • 10:31 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:27 Emperor: repool ms-fe2013 with new envoy TLS setup T317616
  • 10:26 Emperor: repool ms-fe1013 with new envoy TLS setup T317616
  • 10:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53711 and previous config saved to /var/cache/conftool/dbconfig/20231122-102629-arnaudb.json
  • 10:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53710 and previous config saved to /var/cache/conftool/dbconfig/20231122-102621-arnaudb.json
  • 10:25 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:25 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:25 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:24 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@0cca675] (releasing): (no justification provided) (duration: 00m 40s)
  • 10:23 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@0cca675] (releasing): (no justification provided)
  • 10:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs (T351069)
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53709 and previous config saved to /var/cache/conftool/dbconfig/20231122-101124-arnaudb.json
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53708 and previous config saved to /var/cache/conftool/dbconfig/20231122-101116-arnaudb.json
  • 10:07 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: swift::storage
  • 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53707 and previous config saved to /var/cache/conftool/dbconfig/20231122-095619-arnaudb.json
  • 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53706 and previous config saved to /var/cache/conftool/dbconfig/20231122-095611-arnaudb.json
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs (T351069)
  • 09:53 vgutierrez: rolling restart of pybal to catch up on a NOOP config update - T351069
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host wcqs2001.codfw.wmnet
  • 09:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 09:47 elukey: Update of the profile::base::certificate's CA bundle fleet wide (https://gerrit.wikimedia.org/r/c/operations/puppet/+/976659)
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53705 and previous config saved to /var/cache/conftool/dbconfig/20231122-094114-arnaudb.json
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53704 and previous config saved to /var/cache/conftool/dbconfig/20231122-094106-arnaudb.json
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::main
  • 09:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 09:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host wcqs2001.codfw.wmnet
  • 09:34 elukey@cumin1001: START - Cookbook sre.puppet.migrate-role for role: kafka::main
  • 09:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 09:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2013.codfw.wmnet with OS bullseye
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53703 and previous config saved to /var/cache/conftool/dbconfig/20231122-092609-arnaudb.json
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53702 and previous config saved to /var/cache/conftool/dbconfig/20231122-092601-arnaudb.json
  • 09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53701 and previous config saved to /var/cache/conftool/dbconfig/20231122-091104-arnaudb.json
  • 09:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53700 and previous config saved to /var/cache/conftool/dbconfig/20231122-091056-arnaudb.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gerrit
  • 09:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2041.codfw.wmnet
  • 09:04 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2041.codfw.wmnet
  • 09:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gerrit
  • 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2013.codfw.wmnet with OS bullseye
  • 09:00 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 08:59 Emperor: depool ms-fe1013 to reimage with new envoy TLS setup T317616
  • 08:59 Emperor: depool ms-fe2013 to reimage with new envoy TLS setup T317616
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53699 and previous config saved to /var/cache/conftool/dbconfig/20231122-084943-root.json
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: titan
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53698 and previous config saved to /var/cache/conftool/dbconfig/20231122-083438-root.json
  • 08:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: titan
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53697 and previous config saved to /var/cache/conftool/dbconfig/20231122-081933-root.json
  • 08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53696 and previous config saved to /var/cache/conftool/dbconfig/20231122-081912-arnaudb.json
  • 08:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: logging::mediawiki::udp2log
  • 08:14 kartik@deploy2002: Finished scap: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267) (duration: 08m 46s)
  • 08:09 kartik@deploy2002: kartik: Continuing with sync
  • 08:07 kartik@deploy2002: kartik: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 kartik@deploy2002: Started scap: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267)
  • 08:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: logging::mediawiki::udp2log
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53695 and previous config saved to /var/cache/conftool/dbconfig/20231122-080428-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53694 and previous config saved to /var/cache/conftool/dbconfig/20231122-074923-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53693 and previous config saved to /var/cache/conftool/dbconfig/20231122-072247-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to test 10.4.32 T351283', diff saved to https://phabricator.wikimedia.org/P53692 and previous config saved to /var/cache/conftool/dbconfig/20231122-071911-root.json
  • 07:07 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" (duration: 08m 10s)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53691 and previous config saved to /var/cache/conftool/dbconfig/20231122-070742-root.json
  • 07:02 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:01 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:59 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master"
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bookworm
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53690 and previous config saved to /var/cache/conftool/dbconfig/20231122-065238-root.json
  • 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2012.codfw.wmnet with reason: host reimage
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: host reimage
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53689 and previous config saved to /var/cache/conftool/dbconfig/20231122-063733-root.json
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bookworm
  • 06:22 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620) (duration: 07m 28s)
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53688 and previous config saved to /var/cache/conftool/dbconfig/20231122-062228-root.json
  • 06:17 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:16 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 06:15 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620)
  • 04:25 eileen: civicrm upgraded from 43d191c8 to 3c5db93b
  • 02:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 01:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1158']
  • 01:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 00:55 tstarling@deploy2002: Synchronized wmf-config/CommonSettings.php: enable LoginNotify seen subnets table g965663 T346989 (duration: 06m 23s)
  • 00:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye

2023-11-21

  • 23:05 vriley@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:05 vriley@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 23:04 vriley@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 23:02 vriley@cumin1001: START - Cookbook sre.dns.netbox
  • 23:02 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1036
  • 23:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1038
  • 23:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1037
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1038
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1037
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1036
  • 22:59 vriley@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:59 vriley@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 22:58 vriley@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1021.eqiad.wmnet
  • 22:54 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1021.eqiad.wmnet
  • 22:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:48 vriley@cumin1001: START - Cookbook sre.dns.netbox
  • 22:47 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1035
  • 22:46 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1035
  • 22:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
  • 22:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53687 and previous config saved to /var/cache/conftool/dbconfig/20231121-223053-arnaudb.json
  • 22:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
  • 22:23 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:18 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:18 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:17 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:15 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53686 and previous config saved to /var/cache/conftool/dbconfig/20231121-221547-arnaudb.json
  • 22:15 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:12 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:11 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:08 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS bullseye
  • 22:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53685 and previous config saved to /var/cache/conftool/dbconfig/20231121-220040-arnaudb.json
  • 21:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53684 and previous config saved to /var/cache/conftool/dbconfig/20231121-214534-arnaudb.json
  • 21:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
  • 21:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
  • 21:35 catrope@deploy2002: Sync cancelled.
  • 21:31 catrope@deploy2002: ssastry and catrope: Backport for ParserOutputPostCacheTransform: Don't reprocess content (T351461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 catrope@deploy2002: Started scap: Backport for ParserOutputPostCacheTransform: Don't reprocess content (T351461)
  • 21:29 catrope@deploy2002: Finished scap: Backport for [parser] Broaden TOC placeholder regular expression (duration: 12m 40s)
  • 21:23 catrope@deploy2002: catrope and ssastry: Continuing with sync
  • 21:18 catrope@deploy2002: catrope and ssastry: Backport for [parser] Broaden TOC placeholder regular expression synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 catrope@deploy2002: Started scap: Backport for [parser] Broaden TOC placeholder regular expression
  • 21:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
  • 21:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
  • 20:53 mutante: gerrit1003 - deleted /root/backup_of_srv_gerrit_plugins - disk usage down to 56% (T351658)
  • 20:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
  • 20:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53683 and previous config saved to /var/cache/conftool/dbconfig/20231121-204501-arnaudb.json
  • 20:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 20:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 20:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53682 and previous config saved to /var/cache/conftool/dbconfig/20231121-204440-arnaudb.json
  • 20:41 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:41 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1018.eqiad.wmnet
  • 20:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1018.eqiad.wmnet
  • 20:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P53681 and previous config saved to /var/cache/conftool/dbconfig/20231121-202933-arnaudb.json
  • 20:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P53680 and previous config saved to /var/cache/conftool/dbconfig/20231121-201427-arnaudb.json
  • 20:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
  • 20:02 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53679 and previous config saved to /var/cache/conftool/dbconfig/20231121-195920-arnaudb.json
  • 19:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
  • 19:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
  • 19:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 19:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
  • 19:26 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
  • 19:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
  • 19:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
  • 18:42 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part III (T351675) (duration: 25m 41s)
  • 18:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 18:30 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:29 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part III (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 18:16 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part III (T351675)
  • 18:15 jynus: restart of bacula-sd on backup1009 T351725
  • 18:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1158']
  • 18:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1158']
  • 18:11 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part II (T351675) (duration: 08m 24s)
  • 18:06 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:04 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part II (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:03 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part II (T351675)
  • 18:02 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:02 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part I (T351675) (duration: 08m 27s)
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating restbase servers in codfw - jhancock@cumin2002"
  • 17:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating restbase servers in codfw - jhancock@cumin2002"
  • 17:56 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:56 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:54 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part I (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1174']
  • 17:53 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1174']
  • 17:53 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part I (T351675)
  • 17:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1173']
  • 17:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1170']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1169']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1168']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1167']
  • 17:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1174']
  • 17:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1173']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1166']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1170']
  • 17:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1164']
  • 17:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1165']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1169']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1168']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1167']
  • 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1160']
  • 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1159']
  • 17:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1163']
  • 17:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1166']
  • 17:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1165']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1164']
  • 17:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1162']
  • 17:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1161']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1163']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1162']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1161']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1159']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1159']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1159']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1173']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1170']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1169']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1168']
  • 17:19 ejegg: fundraising civicrm upgraded from 3a8558e7 to 43d191c8
  • 17:16 ejegg: payments-wiki upgraded from 56790715 to 714552c5
  • 17:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1174']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1173']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1170']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1169']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1168']
  • 17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:00 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:47 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1017-1019].eqiad.wmnet} and A:lvs (T351069)
  • 16:38 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:36 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1017-1019].eqiad.wmnet} and A:lvs (T351069)
  • 16:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020.eqiad.wmnet} and A:lvs (T351069)
  • 16:34 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020.eqiad.wmnet} and A:lvs (T351069)
  • 16:33 vgutierrez: updating pybal to 1.5.14 on eqiad - T351069
  • 16:11 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kafka-main1001.eqiad.wmnet
  • 16:07 elukey@cumin1001: START - Cookbook sre.puppet.migrate-host for host kafka-main1001.eqiad.wmnet
  • 16:05 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 16:05 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 16:03 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate cloud-private svc ips to wiki replicas - taavi@cumin1001"
  • 16:02 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate cloud-private svc ips to wiki replicas - taavi@cumin1001"
  • 15:59 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:51 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::wmcs
  • 15:47 fabfur: repooled cp1088
  • 15:44 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: insetup::wmcs
  • 15:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::cloudgw
  • 15:37 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::cloudgw
  • 15:34 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::cloudlb
  • 15:27 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
  • 15:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::cloudlb
  • 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::db::wikireplicas::analytics_multiinstance
  • 15:23 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
  • 15:21 fabfur: depooled cp1113
  • 15:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::db::wikireplicas::analytics_multiinstance
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::db::wikireplicas::web_multiinstance
  • 15:12 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
  • 15:07 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::db::wikireplicas::web_multiinstance
  • 15:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::distribution::server
  • 15:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1015.eqiad.wmnet
  • 15:04 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1015.eqiad.wmnet
  • 15:04 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1015.eqiad.wmnet with OS bullseye
  • 15:00 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dumps::distribution::server
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::cloud_management
  • 14:53 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:53 hnowlan@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: cluster::cloud_management
  • 14:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1015.eqiad.wmnet with reason: host reimage
  • 14:45 Lucas_WMDE: T350224 maintenance script finished (8m46s real time)
  • 14:44 fabfur: swapped cp1113 <-> cp1088 (T349244)
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: host reimage
  • 14:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1113.eqiad.wmnet
  • 14:43 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1113.eqiad.wmnet
  • 14:42 fabfur: swapped cp1112 <-> cp1087 (T349244)
  • 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1112.eqiad.wmnet
  • 14:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1112.eqiad.wmnet
  • 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[3008-3009].esams.wmnet} and A:lvs (T351069)
  • 14:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53676 and previous config saved to /var/cache/conftool/dbconfig/20231121-143640-arnaudb.json
  • 14:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53675 and previous config saved to /var/cache/conftool/dbconfig/20231121-143619-arnaudb.json
  • 14:36 Lucas_WMDE: START [in tmux] lucaswerkmeister-wmde@mwmaint2002:~$ mwscript Wikibase.Lexeme.Maintenance.FixPagePropsSortkey wikidatawiki --batch-size=1000 # T350224
  • 14:35 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[3008-3009].esams.wmnet} and A:lvs (T351069)
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs3010.esams.wmnet} and A:lvs (T351069)
  • 14:34 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs3010.esams.wmnet} and A:lvs (T351069)
  • 14:32 vgutierrez: updating pybal to 1.5.14 on esams - T351069
  • 14:32 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1015.eqiad.wmnet with OS bullseye
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1015.eqiad.wmnet
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1015.eqiad.wmnet
  • 14:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P53674 and previous config saved to /var/cache/conftool/dbconfig/20231121-142112-arnaudb.json
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for mc: Make it possible to use mcrouter server set by environment (T346690) (duration: 07m 09s)
  • 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[6001-6002].drmrs.wmnet} and A:lvs (T351069)
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:08 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[6001-6002].drmrs.wmnet} and A:lvs (T351069)
  • 14:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs6003.drmrs.wmnet} and A:lvs (T351069)
  • 14:07 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs6003.drmrs.wmnet} and A:lvs (T351069)
  • 14:07 godog: revert rsyslog upgrade on centrallog2002 - T351710
  • 14:06 vgutierrez: updating pybal to 1.5.14 on drmrs - T351069
  • 14:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P53673 and previous config saved to /var/cache/conftool/dbconfig/20231121-140606-arnaudb.json
  • 14:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and d3r1ck01: Continuing with sync
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and d3r1ck01: Backport for mc: Make it possible to use mcrouter server set by environment (T346690) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for mc: Make it possible to use mcrouter server set by environment (T346690)
  • 13:58 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:57 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:56 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53672 and previous config saved to /var/cache/conftool/dbconfig/20231121-135059-arnaudb.json
  • 13:49 godog: test upgrade rsyslog on centrallog2002 - T351710
  • 13:38 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::virt_ceph
  • 13:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:32 Emperor: repool ms-fe2014 with new envoy TLS setup T317616
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:22 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::virt_ceph
  • 13:22 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::virt
  • 13:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::virt
  • 13:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 13:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 13:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::services
  • 12:57 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 12:56 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 12:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:52 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::services
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::rabbitmq
  • 12:42 awight@deploy2002: Finished scap: Backport for Revert "Revert "Enable Reference Previews on all wikis"" (duration: 08m 12s)
  • 12:40 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::rabbitmq
  • 12:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:36 awight@deploy2002: awight: Continuing with sync
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::net
  • 12:35 awight@deploy2002: awight: Backport for Revert "Revert "Enable Reference Previews on all wikis"" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:34 awight@deploy2002: Started scap: Backport for Revert "Revert "Enable Reference Previews on all wikis""
  • 12:31 awight@deploy2002: Sync cancelled.
  • 12:28 awight@deploy2002: wmde-fisch and awight: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:27 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 12:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::net
  • 12:22 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::control
  • 12:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:18 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2014.codfw.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:01 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::control
  • 11:59 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::cinder_backups
  • 11:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 11:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 11:51 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::cinder_backups
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host titan2002.codfw.wmnet
  • 11:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2041.codfw.wmnet with reason: NIC 1 Port 1 network link is down
  • 11:37 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2041.codfw.wmnet with reason: NIC 1 Port 1 network link is down
  • 11:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host titan2002.codfw.wmnet
  • 11:22 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2014.codfw.wmnet with OS bullseye
  • 11:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:20 Emperor: depool ms-fe2014 to reimage with new envoy TLS setup T317616
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mwlog2002.codfw.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host mwlog2002.codfw.wmnet
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gitlab_runner
  • 10:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gitlab_runner
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gerrit2002.wikimedia.org
  • 10:35 jbond: upload new wmf-certificates packages
  • 10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gerrit2002.wikimedia.org
  • 10:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 10:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53670 and previous config saved to /var/cache/conftool/dbconfig/20231121-100607-arnaudb.json
  • 10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53669 and previous config saved to /var/cache/conftool/dbconfig/20231121-100536-arnaudb.json
  • 10:03 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:02 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:02 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:01 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gitlab-runner1002.eqiad.wmnet
  • 10:00 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:53 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:51 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53667 and previous config saved to /var/cache/conftool/dbconfig/20231121-095102-arnaudb.json
  • 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53666 and previous config saved to /var/cache/conftool/dbconfig/20231121-095031-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53665 and previous config saved to /var/cache/conftool/dbconfig/20231121-093557-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53664 and previous config saved to /var/cache/conftool/dbconfig/20231121-093526-arnaudb.json
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[2011-2013].codfw.wmnet} and A:lvs (T351069)
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53663 and previous config saved to /var/cache/conftool/dbconfig/20231121-092052-arnaudb.json
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53662 and previous config saved to /var/cache/conftool/dbconfig/20231121-092021-arnaudb.json
  • 09:19 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[2011-2013].codfw.wmnet} and A:lvs (T351069)
  • 09:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.codfw.wmnet} and A:lvs (T351069)
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.codfw.wmnet} and A:lvs (T351069)
  • 09:17 vgutierrez: updating pybal to 1.5.14 on codfw - T351069
  • 09:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[5004-5005].eqsin.wmnet} and A:lvs (T351069)
  • 09:16 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[5004-5005].eqsin.wmnet} and A:lvs (T351069)
  • 09:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs5006.eqsin.wmnet} and A:lvs (T351069)
  • 09:15 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs5006.eqsin.wmnet} and A:lvs (T351069)
  • 09:14 vgutierrez: updating pybal to 1.5.14 on eqsin - T351069
  • 09:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[4008-4009].ulsfo.wmnet} and A:lvs (T351069)
  • 09:10 vgutierrez: updating pybal to 1.5.14 on ulsfo - T351069
  • 09:09 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[4008-4009].ulsfo.wmnet} and A:lvs (T351069)
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53661 and previous config saved to /var/cache/conftool/dbconfig/20231121-090547-arnaudb.json
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53660 and previous config saved to /var/cache/conftool/dbconfig/20231121-090516-arnaudb.json
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53659 and previous config saved to /var/cache/conftool/dbconfig/20231121-085042-arnaudb.json
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53658 and previous config saved to /var/cache/conftool/dbconfig/20231121-085011-arnaudb.json
  • 08:41 vgutierrez: updating pybal to 1.5.14 on lvs4010 - T351069
  • 08:38 awight: scap window cancelled due to k8s error
  • 08:37 awight@deploy2002: Finished scap: Backport for Revert "Enable Reference Previews on all wikis" (duration: 07m 08s)
  • 08:37 vgutierrez: upload pybal 1.15.14 to apt.wm.o (bullseye-wikimedia) - T348837
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53657 and previous config saved to /var/cache/conftool/dbconfig/20231121-083537-arnaudb.json
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53656 and previous config saved to /var/cache/conftool/dbconfig/20231121-083504-arnaudb.json
  • 08:32 awight@deploy2002: awight and trainbranchbot: Continuing with sync
  • 08:31 awight@deploy2002: awight and trainbranchbot: Backport for Revert "Enable Reference Previews on all wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:31 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: ncredir
  • 08:30 awight@deploy2002: Started scap: Backport for Revert "Enable Reference Previews on all wikis"
  • 08:28 awight@deploy2002: Sync cancelled.
  • 08:28 awight@deploy2002: wmde-fisch and awight: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:27 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 08:24 awight@deploy2002: Finished scap: Backport for Enable Reference Previews on all wikis (T282999) (duration: 15m 02s)
  • 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53655 and previous config saved to /var/cache/conftool/dbconfig/20231121-082032-arnaudb.json
  • 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53654 and previous config saved to /var/cache/conftool/dbconfig/20231121-082000-arnaudb.json
  • 08:18 awight@deploy2002: awight and wmde-fisch: Continuing with sync
  • 08:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1011.eqiad.wmnet with OS bullseye
  • 08:11 awight@deploy2002: awight and wmde-fisch: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:09 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53653 and previous config saved to /var/cache/conftool/dbconfig/20231121-080527-arnaudb.json
  • 08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53652 and previous config saved to /var/cache/conftool/dbconfig/20231121-080455-arnaudb.json
  • 07:52 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 07:50 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bookworm
  • 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 07:25 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
  • 07:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 07:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53651 and previous config saved to /var/cache/conftool/dbconfig/20231121-072424-arnaudb.json
  • 07:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53650 and previous config saved to /var/cache/conftool/dbconfig/20231121-072346-arnaudb.json
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bookworm
  • 07:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P53649 and previous config saved to /var/cache/conftool/dbconfig/20231121-070840-arnaudb.json
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33452
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33452
  • 06:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P53648 and previous config saved to /var/cache/conftool/dbconfig/20231121-065333-arnaudb.json
  • 06:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53647 and previous config saved to /var/cache/conftool/dbconfig/20231121-063827-arnaudb.json
  • 02:32 ejegg: fundraising civicrm upgraded from b3da5d3f to 3a8558e7
  • 01:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53646 and previous config saved to /var/cache/conftool/dbconfig/20231121-013514-arnaudb.json
  • 01:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance

2023-11-20

  • 22:46 catrope@deploy2002: Finished scap: Backport for [parsoid] Fix Parsoid relative links (T350952) (duration: 19m 32s)
  • 22:41 catrope@deploy2002: catrope and cscott: Continuing with sync
  • 22:28 catrope@deploy2002: catrope and cscott: Backport for [parsoid] Fix Parsoid relative links (T350952) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:27 catrope@deploy2002: Started scap: Backport for [parsoid] Fix Parsoid relative links (T350952)
  • 22:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entries for cloud hosts. - cmooney@cumin1001"
  • 22:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entries for cloud hosts. - cmooney@cumin1001"
  • 22:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:15 catrope@deploy2002: Finished scap: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622) (duration: 17m 40s)
  • 22:09 catrope@deploy2002: jdlrobson and catrope: Continuing with sync
  • 21:59 catrope@deploy2002: jdlrobson and catrope: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:58 catrope@deploy2002: Started scap: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622)
  • 21:38 catrope@deploy2002: Finished scap: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669) (duration: 22m 11s)
  • 21:32 catrope@deploy2002: catrope and jdlrobson: Continuing with sync
  • 21:17 catrope@deploy2002: catrope and jdlrobson: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 catrope@deploy2002: Started scap: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669)
  • 21:12 catrope@deploy2002: Finished scap: Backport for Enable action blocks in ruwiki (T351048) (duration: 08m 52s)
  • 21:06 catrope@deploy2002: catrope and stjn: Continuing with sync
  • 21:05 catrope@deploy2002: catrope and stjn: Backport for Enable action blocks in ruwiki (T351048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 catrope@deploy2002: Started scap: Backport for Enable action blocks in ruwiki (T351048)
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1014.eqiad.wmnet
  • 21:02 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1014.eqiad.wmnet
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: host reimage
  • 20:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: host reimage
  • 20:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53645 and previous config saved to /var/cache/conftool/dbconfig/20231120-203337-arnaudb.json
  • 20:21 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:21 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P53644 and previous config saved to /var/cache/conftool/dbconfig/20231120-201831-arnaudb.json
  • 20:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1013.eqiad.wmnet with OS bullseye
  • 20:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P53643 and previous config saved to /var/cache/conftool/dbconfig/20231120-200324-arnaudb.json
  • 19:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
  • 19:59 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: host reimage
  • 19:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53642 and previous config saved to /var/cache/conftool/dbconfig/20231120-194818-arnaudb.json
  • 19:48 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: host reimage
  • 19:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1013.eqiad.wmnet with OS bullseye
  • 19:21 sukhe: pool cp4045.ulsfo.wmnet post reboot and puppet 7 upgrade
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4045.ulsfo.wmnet
  • 19:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4045.ulsfo.wmnet
  • 19:04 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief2001.codfw.wmnet with OS bookworm
  • 19:03 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:02 sukhe: depool cp4045 for reboot
  • 18:59 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 18:59 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 18:59 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 18:59 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 18:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4045.ulsfo.wmnet
  • 18:48 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4045.ulsfo.wmnet
  • 18:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 18:41 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 18:39 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:38 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 18:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 brett@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief2001.codfw.wmnet with OS bookworm
  • 18:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wikidough
  • 18:18 volans: installed spicerack v8.1.0 on the cumin hosts
  • 18:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: wikidough
  • 18:08 ebernhardson: start test backfill of 4 days of itwiki and frwiki edits to relforge from cirrus updater
  • 18:06 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:37 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:32 volans: uploaded spicerack_8.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 17:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: durum
  • 17:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 17:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 17:18 hashar: Restarting Gerrit # T351658
  • 17:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: durum
  • 17:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:56 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237) (duration: 10m 06s)
  • 16:51 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:48 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:46 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237)
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210', diff saved to https://phabricator.wikimedia.org/P53638 and previous config saved to /var/cache/conftool/dbconfig/20231120-162648-root.json
  • 15:48 Lucas_WMDE: DONE Wikibase.Lexeme.Maintenance.FixPagePropsSortkey (T350224) in 1.079s real time :)
  • 15:48 fabfur: swapped cp1111 <-> cp1086 (T349244)
  • 15:48 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript Wikibase.Lexeme.Maintenance.FixPagePropsSortkey testwikidatawiki --batch-size=1000 # T350224
  • 15:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1111.eqiad.wmnet
  • 15:47 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1111.eqiad.wmnet
  • 15:44 fabfur: swapped cp1110 <-> cp1085 (T349244)
  • 15:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1110.eqiad.wmnet
  • 15:42 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:39 urbanecm: UTC afternoon B&C window done
  • 14:37 urbanecm@deploy2002: Finished scap: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224) (duration: 10m 28s)
  • 14:31 urbanecm@deploy2002: urbanecm and lucaswerkmeister-wmde and cyndywikime: Continuing with sync
  • 14:28 urbanecm@deploy2002: urbanecm and lucaswerkmeister-wmde and cyndywikime: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 urbanecm@deploy2002: Started scap: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224)
  • 14:26 urbanecm@deploy2002: Finished scap: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314) (duration: 09m 48s)
  • 14:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:20 urbanecm@deploy2002: urbanecm and wmde-fisch: Continuing with sync
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53637 and previous config saved to /var/cache/conftool/dbconfig/20231120-141857-arnaudb.json
  • 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53636 and previous config saved to /var/cache/conftool/dbconfig/20231120-141835-arnaudb.json
  • 14:17 urbanecm@deploy2002: urbanecm and wmde-fisch: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:16 urbanecm@deploy2002: Started scap: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314)
  • 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'prepare reboot of es2032 for T344589', diff saved to https://phabricator.wikimedia.org/P53635 and previous config saved to /var/cache/conftool/dbconfig/20231120-141312-arnaudb.json
  • 14:12 urbanecm@deploy2002: Finished scap: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004) (duration: 07m 06s)
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'set es2028 as es1 master for T344589', diff saved to https://phabricator.wikimedia.org/P53634 and previous config saved to /var/cache/conftool/dbconfig/20231120-141131-arnaudb.json
  • 14:07 urbanecm@deploy2002: urbanecm and d3r1ck01: Continuing with sync
  • 14:06 urbanecm@deploy2002: urbanecm and d3r1ck01: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:05 urbanecm@deploy2002: Started scap: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004)
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P53633 and previous config saved to /var/cache/conftool/dbconfig/20231120-140329-arnaudb.json
  • 13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P53632 and previous config saved to /var/cache/conftool/dbconfig/20231120-134822-arnaudb.json
  • 13:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2178.codfw.wmnet onto db2192.codfw.wmnet
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host pc1014.eqiad.wmnet
  • 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53631 and previous config saved to /var/cache/conftool/dbconfig/20231120-133316-arnaudb.json
  • 13:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2180.codfw.wmnet onto db2193.codfw.wmnet
  • 13:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host pc1014.eqiad.wmnet
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53630 and previous config saved to /var/cache/conftool/dbconfig/20231120-125655-root.json
  • 12:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2178.codfw.wmnet onto db2192.codfw.wmnet
  • 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2178 in db2192 for T343674', diff saved to https://phabricator.wikimedia.org/P53629 and previous config saved to /var/cache/conftool/dbconfig/20231120-124522-arnaudb.json
  • 12:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53628 and previous config saved to /var/cache/conftool/dbconfig/20231120-124150-root.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53627 and previous config saved to /var/cache/conftool/dbconfig/20231120-122645-root.json
  • 12:22 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2180.codfw.wmnet onto db2193.codfw.wmnet
  • 12:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1143.eqiad.wmnet onto db1243.eqiad.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53625 and previous config saved to /var/cache/conftool/dbconfig/20231120-121140-root.json
  • 12:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2180 in db2193 for T343674', diff saved to https://phabricator.wikimedia.org/P53624 and previous config saved to /var/cache/conftool/dbconfig/20231120-120743-arnaudb.json
  • 12:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53623 and previous config saved to /var/cache/conftool/dbconfig/20231120-115635-root.json
  • 11:48 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210', diff saved to https://phabricator.wikimedia.org/P53622 and previous config saved to /var/cache/conftool/dbconfig/20231120-113205-root.json
  • 11:26 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 11:23 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 11:21 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::worker
  • 11:16 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::worker
  • 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::master
  • 11:00 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::master
  • 10:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1009.eqiad.wmnet with OS bullseye
  • 10:57 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::ml_etcd
  • 10:56 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management records for ganeti103[5-8] - T349925 - volans@cumin1001"
  • 10:55 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management records for ganeti103[5-8] - T349925 - volans@cumin1001"
  • 10:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::ml_etcd
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53620 and previous config saved to /var/cache/conftool/dbconfig/20231120-102327-arnaudb.json
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53619 and previous config saved to /var/cache/conftool/dbconfig/20231120-102303-arnaudb.json
  • 10:22 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:22 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:13 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:13 klausman@cumin1001: START - Cookbook sre.puppet.renew-cert for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:12 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:12 klausman@cumin1001: START - Cookbook sre.puppet.renew-cert for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53618 and previous config saved to /var/cache/conftool/dbconfig/20231120-100823-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53617 and previous config saved to /var/cache/conftool/dbconfig/20231120-100758-arnaudb.json
  • 10:05 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1143.eqiad.wmnet onto db1243.eqiad.wmnet
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'T344036 add db1243', diff saved to https://phabricator.wikimedia.org/P53616 and previous config saved to /var/cache/conftool/dbconfig/20231120-100212-arnaudb.json
  • 10:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53615 and previous config saved to /var/cache/conftool/dbconfig/20231120-095318-arnaudb.json
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53614 and previous config saved to /var/cache/conftool/dbconfig/20231120-095253-arnaudb.json
  • 09:50 Emperor: restart swift_dispersion_stats on thanos-fe1001
  • 09:41 godog: add 50G to prometheus/k8s in codfw
  • 09:39 godog: add 50G to prometheus/services in eqiad
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53613 and previous config saved to /var/cache/conftool/dbconfig/20231120-093813-arnaudb.json
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53612 and previous config saved to /var/cache/conftool/dbconfig/20231120-093748-arnaudb.json
  • 09:34 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 09:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53611 and previous config saved to /var/cache/conftool/dbconfig/20231120-092308-arnaudb.json
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53610 and previous config saved to /var/cache/conftool/dbconfig/20231120-092243-arnaudb.json
  • 09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 09:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53609 and previous config saved to /var/cache/conftool/dbconfig/20231120-090803-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53608 and previous config saved to /var/cache/conftool/dbconfig/20231120-090738-arnaudb.json
  • 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53607 and previous config saved to /var/cache/conftool/dbconfig/20231120-085636-root.json
  • 08:54 XioNoX: Refresh client certificate for central logging on pfw's - T351110
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53606 and previous config saved to /var/cache/conftool/dbconfig/20231120-085258-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53605 and previous config saved to /var/cache/conftool/dbconfig/20231120-085233-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53603 and previous config saved to /var/cache/conftool/dbconfig/20231120-083753-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53602 and previous config saved to /var/cache/conftool/dbconfig/20231120-083729-arnaudb.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53601 and previous config saved to /var/cache/conftool/dbconfig/20231120-082625-root.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53600 and previous config saved to /var/cache/conftool/dbconfig/20231120-082248-arnaudb.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53599 and previous config saved to /var/cache/conftool/dbconfig/20231120-082224-arnaudb.json
  • 08:18 kartik@deploy2002: Finished scap: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915) (duration: 11m 49s)
  • 08:13 kartik@deploy2002: kartik: Continuing with sync
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53598 and previous config saved to /var/cache/conftool/dbconfig/20231120-081120-root.json
  • 08:10 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:08 kartik@deploy2002: kartik: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:07 kartik@deploy2002: Started scap: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915)
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53597 and previous config saved to /var/cache/conftool/dbconfig/20231120-080541-arnaudb.json
  • 08:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53596 and previous config saved to /var/cache/conftool/dbconfig/20231120-080519-arnaudb.json
  • 08:00 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 07m 52s)
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53595 and previous config saved to /var/cache/conftool/dbconfig/20231120-075615-root.json
  • 07:54 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:54 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:52 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53594 and previous config saved to /var/cache/conftool/dbconfig/20231120-075013-arnaudb.json
  • 07:49 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:48 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:37 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bookworm
  • 07:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53593 and previous config saved to /var/cache/conftool/dbconfig/20231120-073506-arnaudb.json
  • 07:34 moritzm: installing ncurses security updates
  • 07:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53592 and previous config saved to /var/cache/conftool/dbconfig/20231120-072000-arnaudb.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bookworm
  • 07:04 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284) (duration: 07m 58s)
  • 06:58 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:58 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:56 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284)
  • 06:54 moritzm: installing python3.7 security updates
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host apt1002.wikimedia.org
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210 T351283', diff saved to https://phabricator.wikimedia.org/P53591 and previous config saved to /var/cache/conftool/dbconfig/20231120-064733-root.json
  • 06:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host apt1002.wikimedia.org
  • 06:25 moritzm: installing qemu security updates on bullseye
  • 06:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53590 and previous config saved to /var/cache/conftool/dbconfig/20231120-061928-arnaudb.json
  • 06:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 06:19 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 06:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53589 and previous config saved to /var/cache/conftool/dbconfig/20231120-061906-arnaudb.json
  • 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 06:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 06:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P53588 and previous config saved to /var/cache/conftool/dbconfig/20231120-060400-arnaudb.json
  • 05:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P53587 and previous config saved to /var/cache/conftool/dbconfig/20231120-054853-arnaudb.json
  • 05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53586 and previous config saved to /var/cache/conftool/dbconfig/20231120-053347-arnaudb.json
  • 00:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53585 and previous config saved to /var/cache/conftool/dbconfig/20231120-003846-arnaudb.json
  • 00:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53584 and previous config saved to /var/cache/conftool/dbconfig/20231120-003824-arnaudb.json
  • 00:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P53583 and previous config saved to /var/cache/conftool/dbconfig/20231120-002317-arnaudb.json
  • 00:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P53582 and previous config saved to /var/cache/conftool/dbconfig/20231120-000811-arnaudb.json

2023-11-19

  • 23:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53581 and previous config saved to /var/cache/conftool/dbconfig/20231119-235305-arnaudb.json
  • 18:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53580 and previous config saved to /var/cache/conftool/dbconfig/20231119-183758-arnaudb.json
  • 18:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53579 and previous config saved to /var/cache/conftool/dbconfig/20231119-183736-arnaudb.json
  • 18:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P53578 and previous config saved to /var/cache/conftool/dbconfig/20231119-182230-arnaudb.json
  • 18:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P53577 and previous config saved to /var/cache/conftool/dbconfig/20231119-180723-arnaudb.json
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53576 and previous config saved to /var/cache/conftool/dbconfig/20231119-175217-arnaudb.json
  • 12:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53575 and previous config saved to /var/cache/conftool/dbconfig/20231119-123433-arnaudb.json
  • 12:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 12:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 03:19 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 02:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance

2023-11-18

  • 21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 21:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53574 and previous config saved to /var/cache/conftool/dbconfig/20231118-213454-arnaudb.json
  • 21:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P53573 and previous config saved to /var/cache/conftool/dbconfig/20231118-211947-arnaudb.json
  • 21:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P53572 and previous config saved to /var/cache/conftool/dbconfig/20231118-210441-arnaudb.json
  • 20:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53571 and previous config saved to /var/cache/conftool/dbconfig/20231118-204934-arnaudb.json
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53570 and previous config saved to /var/cache/conftool/dbconfig/20231118-145043-arnaudb.json
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53569 and previous config saved to /var/cache/conftool/dbconfig/20231118-145003-arnaudb.json
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P53568 and previous config saved to /var/cache/conftool/dbconfig/20231118-143457-arnaudb.json
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P53567 and previous config saved to /var/cache/conftool/dbconfig/20231118-141950-arnaudb.json
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53566 and previous config saved to /var/cache/conftool/dbconfig/20231118-140444-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53565 and previous config saved to /var/cache/conftool/dbconfig/20231118-085142-arnaudb.json
  • 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53564 and previous config saved to /var/cache/conftool/dbconfig/20231118-085121-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P53563 and previous config saved to /var/cache/conftool/dbconfig/20231118-083615-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P53562 and previous config saved to /var/cache/conftool/dbconfig/20231118-082108-arnaudb.json
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53561 and previous config saved to /var/cache/conftool/dbconfig/20231118-080602-arnaudb.json
  • 03:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53560 and previous config saved to /var/cache/conftool/dbconfig/20231118-030303-arnaudb.json
  • 03:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 03:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance

2023-11-17

  • 23:39 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:38 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53559 and previous config saved to /var/cache/conftool/dbconfig/20231117-215947-arnaudb.json
  • 21:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P53558 and previous config saved to /var/cache/conftool/dbconfig/20231117-214441-arnaudb.json
  • 21:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P53557 and previous config saved to /var/cache/conftool/dbconfig/20231117-212935-arnaudb.json
  • 21:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53556 and previous config saved to /var/cache/conftool/dbconfig/20231117-211428-arnaudb.json
  • 19:51 bvibber: brion regenerating .m3u8 streaming manifests for all video files on mwmaint2002 (cleanup for T350996)
  • 18:05 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:04 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:01 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:59 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:59 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:58 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:50 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:49 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:48 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:47 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1038
  • 17:47 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:46 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1038
  • 17:46 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1037
  • 17:45 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1036
  • 17:45 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1037
  • 17:43 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1036
  • 17:42 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1035
  • 17:40 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1035
  • 17:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1142.eqiad.wmnet onto db1242.eqiad.wmnet
  • 16:46 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53553 and previous config saved to /var/cache/conftool/dbconfig/20231117-161806-arnaudb.json
  • 16:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53552 and previous config saved to /var/cache/conftool/dbconfig/20231117-161744-arnaudb.json
  • 16:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P53551 and previous config saved to /var/cache/conftool/dbconfig/20231117-160238-arnaudb.json
  • 15:58 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:58 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:58 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:57 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P53550 and previous config saved to /var/cache/conftool/dbconfig/20231117-154731-arnaudb.json
  • 15:38 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:38 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53549 and previous config saved to /var/cache/conftool/dbconfig/20231117-153225-arnaudb.json
  • 15:05 XioNoX: cr1-esams> request chassis fpc slot 1 online - T351304
  • 14:45 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1142.eqiad.wmnet onto db1242.eqiad.wmnet
  • 14:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1142 in db1242 for T344036', diff saved to https://phabricator.wikimedia.org/P53547 and previous config saved to /var/cache/conftool/dbconfig/20231117-144234-arnaudb.json
  • 14:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:20 elukey@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 14:20 elukey@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1001.eqiad.wmnet
  • 13:48 jynus: reenable puppet on dbprov2001 T351491
  • 13:47 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 13:47 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 13:47 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1001.eqiad.wmnet
  • 13:46 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: END (ERROR) - Cookbook sre.puppet.migrate-host (exit_code=97) for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:44 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1003.eqiad.wmnet
  • 13:44 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1002.eqiad.wmnet
  • 13:44 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:42 moritzm: imported php-luasandbox 4.0.2-3+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 13:36 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:35 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1005.eqiad.wmnet
  • 13:33 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1004.eqiad.wmnet
  • 13:32 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1005.eqiad.wmnet
  • 13:30 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1006.eqiad.wmnet
  • 13:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1006.eqiad.wmnet
  • 13:26 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1007.eqiad.wmnet
  • 13:24 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1007.eqiad.wmnet
  • 12:54 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ldap-rw2001.wikimedia.org
  • 12:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw2001.wikimedia.org
  • 12:53 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ldap-rw2001.wikimedia.org
  • 12:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw2001.wikimedia.org
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 12:51 joal@deploy2002: Finished deploy [airflow-dags/analytics@a5e5ddc]: Airflow HOTFIX [airflow-dags/analytics@a5e5ddca] (duration: 00m 28s)
  • 12:50 joal@deploy2002: Started deploy [airflow-dags/analytics@a5e5ddc]: Airflow HOTFIX [airflow-dags/analytics@a5e5ddca]
  • 12:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw1001.wikimedia.org
  • 12:10 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:36 mabualruz@deploy2002: Finished scap: Backport for Fixes AMC outreach drawer (T351362) (duration: 07m 32s)
  • 11:30 mabualruz@deploy2002: mabualruz: Continuing with sync
  • 11:29 mabualruz@deploy2002: mabualruz: Backport for Fixes AMC outreach drawer (T351362) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:28 mabualruz@deploy2002: Started scap: Backport for Fixes AMC outreach drawer (T351362)
  • 11:20 jynus: running schema change on backup1-codfw (mediabackups) T191804
  • 11:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:11 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:10 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:10 jynus: running schema change on backup1-eqiad (mediabackups) T191804
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:07 claime: Redeploying mw-on-k8s for T350430
  • 10:54 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1008.eqiad.wmnet
  • 10:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:51 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1008.eqiad.wmnet
  • 10:31 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53542 and previous config saved to /var/cache/conftool/dbconfig/20231117-102952-arnaudb.json
  • 10:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 10:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53541 and previous config saved to /var/cache/conftool/dbconfig/20231117-102931-arnaudb.json
  • 10:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve-ctrl1001.eqiad.wmnet
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 10:20 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve-ctrl1002.eqiad.wmnet
  • 10:19 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:19 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:17 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"
  • 10:16 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:16 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:16 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P53540 and previous config saved to /var/cache/conftool/dbconfig/20231117-101425-arnaudb.json
  • 10:12 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"
  • 10:12 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 10:09 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1001.eqiad.wmnet
  • 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 09:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P53539 and previous config saved to /var/cache/conftool/dbconfig/20231117-095918-arnaudb.json
  • 09:51 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1002.eqiad.wmnet
  • 09:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53537 and previous config saved to /var/cache/conftool/dbconfig/20231117-094412-arnaudb.json
  • 09:38 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 09:31 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1003.eqiad.wmnet
  • 09:24 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with new runners
  • 09:22 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with new runners
  • 09:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:04 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:01 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host crm2001.codfw.wmnet
  • 09:01 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host crm2001.codfw.wmnet with OS bookworm
  • 08:45 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on crm2001.codfw.wmnet with reason: host reimage
  • 08:42 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on crm2001.codfw.wmnet with reason: host reimage
  • 08:30 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 08:25 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host crm2001.codfw.wmnet with OS bookworm
  • 08:15 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:14 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) crm2001.codfw.wmnet on all recursors
  • 08:14 jmm@cumin1001: START - Cookbook sre.dns.wipe-cache crm2001.codfw.wmnet on all recursors
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:13 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:10 jmm@cumin1001: START - Cookbook sre.dns.netbox
  • 08:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host crm2001.codfw.wmnet
  • 08:06 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:05 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host debmonitor2003.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host debmonitor2003.codfw.wmnet
  • 07:34 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:34 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:34 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:34 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2133.codfw.wmnet with OS bookworm
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2133.codfw.wmnet with reason: host reimage
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2133.codfw.wmnet with reason: host reimage
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2133.codfw.wmnet with OS bookworm
  • 06:48 mabualruz@deploy2002: Backport cancelled.
  • 04:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53535 and previous config saved to /var/cache/conftool/dbconfig/20231117-044504-arnaudb.json
  • 04:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 04:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 04:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53534 and previous config saved to /var/cache/conftool/dbconfig/20231117-044443-arnaudb.json
  • 04:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P53533 and previous config saved to /var/cache/conftool/dbconfig/20231117-042937-arnaudb.json
  • 04:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P53532 and previous config saved to /var/cache/conftool/dbconfig/20231117-041430-arnaudb.json
  • 03:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53531 and previous config saved to /var/cache/conftool/dbconfig/20231117-035924-arnaudb.json
  • 01:19 cstone: payments-wiki upgraded from eae2f35e to 56790715
  • 01:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 01:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1158']
  • 00:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 00:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 00:48 ejegg: fundraising civiproxy upgraded from c000fc1e to 6625c844
  • 00:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1157']
  • 00:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']

2023-11-16

  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1158']
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 23:43 samtar@deploy2002: Finished scap: Backport for Revert "Disable drawer temporarily while erroring" (duration: 07m 31s)
  • 23:37 samtar@deploy2002: samtar: Continuing with sync
  • 23:37 samtar@deploy2002: samtar: Backport for Revert "Disable drawer temporarily while erroring" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:35 samtar@deploy2002: Started scap: Backport for Revert "Disable drawer temporarily while erroring"
  • 23:34 samtar@deploy2002: Sync cancelled.
  • 23:33 topranks: Change VRRP IP for public1-a-codfw vlan on codfw CRs T347191
  • 23:30 topranks: Add gateway IP for public1-a-codfw Vlan to ssw in codfw T347191
  • 23:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 23:30 samtar@deploy2002: jdlrobson and samtar: Backport for Disable drawer temporarily while erroring (T351362) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 23:28 samtar@deploy2002: Started scap: Backport for Disable drawer temporarily while erroring (T351362)
  • 23:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 2001 entries - cmooney@cumin1001"
  • 23:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 2001 entries - cmooney@cumin1001"
  • 23:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 23:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: Move public1-a-codfw vlan GW from codfw CR routers to ssw
  • 23:10 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: Move public1-a-codfw vlan GW from codfw CR routers to ssw
  • 22:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53529 and previous config saved to /var/cache/conftool/dbconfig/20231116-223915-arnaudb.json
  • 22:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:36 mutante: disabled puppet on miscweb*, netmon* and phab* hosts, deploying gerrit:974285, confirming noop
  • 22:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 1117 entries - cmooney@cumin1001"
  • 22:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 1117 entries - cmooney@cumin1001"
  • 22:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 22:00 dr0ptp4kt@deploy2002: Finished scap: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463) (duration: 09m 50s)
  • 21:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1157']
  • 21:54 dr0ptp4kt@deploy2002: dr0ptp4kt and soda: Continuing with sync
  • 21:53 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1157']
  • 21:53 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1157']
  • 21:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:51 dr0ptp4kt@deploy2002: dr0ptp4kt and soda: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 dr0ptp4kt@deploy2002: Started scap: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463)
  • 21:43 topranks: Removing VRRP config for for public1-b-codfw on codfw CRs (T347191)
  • 21:38 dr0ptp4kt@deploy2002: Finished scap: Backport for Conditionally render the content of header-action instead of the slot (T351121) (duration: 07m 36s)
  • 21:32 dr0ptp4kt@deploy2002: dr0ptp4kt and jforrester: Continuing with sync
  • 21:32 dr0ptp4kt@deploy2002: dr0ptp4kt and jforrester: Backport for Conditionally render the content of header-action instead of the slot (T351121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 dr0ptp4kt@deploy2002: Started scap: Backport for Conditionally render the content of header-action instead of the slot (T351121)
  • 21:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:18 dr0ptp4kt@deploy2002: Finished scap: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353) (duration: 11m 12s)
  • 21:12 dr0ptp4kt@deploy2002: dr0ptp4kt and dani: Continuing with sync
  • 21:08 dr0ptp4kt@deploy2002: dr0ptp4kt and dani: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 dr0ptp4kt@deploy2002: Started scap: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353)
  • 20:54 topranks: changing VRRP GW IP for public1-b-codfw on codfw CRs and disabling IPv6 RAs on the CRs (T347191)
  • 20:41 topranks: adding anycast GW for public1-b-codfw vlan to codfw spine switches (T347191)
  • 20:23 dr0ptp4kt@deploy2002: Finished deploy [airflow-dags/search@b00c6ca]: Deploying Airflow search WDQS graph split HDFS job (duration: 00m 27s)
  • 20:23 dr0ptp4kt@deploy2002: Started deploy [airflow-dags/search@b00c6ca]: Deploying Airflow search WDQS graph split HDFS job
  • 19:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:47 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:47 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for IPs in public1-b-codfw vlan - cmooney@cumin1001"
  • 19:46 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for IPs in public1-b-codfw vlan - cmooney@cumin1001"
  • 19:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 19:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.5 refs T350081
  • 19:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1164
  • 19:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1164
  • 19:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1163
  • 19:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1163
  • 19:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:04 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:02 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:01 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:01 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:56 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:56 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:56 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:55 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 18:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 18:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 18:44 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 18:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 18:14 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53526 and previous config saved to /var/cache/conftool/dbconfig/20231116-174800-arnaudb.json
  • 17:44 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P53525 and previous config saved to /var/cache/conftool/dbconfig/20231116-173254-arnaudb.json
  • 17:29 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:27 brett: Re-enabling puppet on all acme-chief clients post-bookworm upgrade - T342154
  • 17:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:20 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:19 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:18 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P53523 and previous config saved to /var/cache/conftool/dbconfig/20231116-171748-arnaudb.json
  • 17:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:13 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 17:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief1001.eqiad.wmnet
  • 17:08 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief1001.eqiad.wmnet
  • 17:07 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 17:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53522 and previous config saved to /var/cache/conftool/dbconfig/20231116-170241-arnaudb.json
  • 17:00 brett: Disabling puppet on all acme-chief clients for acme-chief bookworm upgrades - T342154
  • 16:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:52 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:50 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1001.eqiad.wmnet with OS bookworm
  • 16:39 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 16:37 sukhe: repool cp4037
  • 16:31 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 6 hosts with reason: Extending downtime for depooled cp hosts
  • 16:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 6 hosts with reason: Extending downtime for depooled cp hosts
  • 16:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
  • 16:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:26 fabfur: swapped cp1109 <-> cp1084 (T349244)
  • 16:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1109.eqiad.wmnet
  • 16:24 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1109.eqiad.wmnet
  • 16:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:20 fabfur: swapped cp1108 <-> cp1083 (T349244)
  • 16:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1108.eqiad.wmnet
  • 16:18 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1108.eqiad.wmnet
  • 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
  • 16:17 brett@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief1001.eqiad.wmnet with OS bookworm
  • 16:17 sukhe: depool cp4037 for reboot [post puppet 7 upgrade]
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4037.ulsfo.wmnet
  • 16:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['aqs1012']
  • 16:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4037.ulsfo.wmnet
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::logging
  • 15:55 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1002.eqiad.wmnet with OS bullseye
  • 15:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::logging
  • 15:38 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1002.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['aqs1012']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['aqs1012']
  • 15:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['aqs1012']
  • 15:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 15:35 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1002.eqiad.wmnet with reason: host reimage
  • 15:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1141.eqiad.wmnet onto db1241.eqiad.wmnet
  • 15:22 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1002.eqiad.wmnet with OS bullseye
  • 15:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: revert logstash changes - bking@cumin2002 - T324335
  • 15:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 15:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 15:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: revert logstash changes - bking@cumin2002 - T324335
  • 15:15 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 15:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 15:01 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 15:01 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:57 arnaudb@cumin1001: dbctl commit (dc=all): 'remove db1136', diff saved to https://phabricator.wikimedia.org/P53519 and previous config saved to /var/cache/conftool/dbconfig/20231116-145754-arnaudb.json
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1136.eqiad.wmnet
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1136.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:56 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1136.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:54 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:49 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1136.eqiad.wmnet
  • 14:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:48 claime: Redeploying mw-on-k8s for T350430
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141626
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141626
  • 14:43 jbond: re-enable puppet on puppet7 agents
  • 14:43 kartik@deploy2002: Finished scap: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273) (duration: 21m 41s)
  • 14:37 kartik@deploy2002: kartik and abi: Continuing with sync
  • 14:23 kartik@deploy2002: kartik and abi: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 kartik@deploy2002: Started scap: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273)
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::monitoring_bullseye
  • 14:15 jbond: stop puppet on puppet7 agents to debug puppet performance
  • 14:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 14:09 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 14:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 14:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::monitoring_bullseye
  • 14:07 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: prometheus
  • 13:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: prometheus
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 13:44 jynus: restart bacula at backup1001
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 13:39 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2001.codfw.wmnet
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ms-be2050.codfw.wmnet
  • 13:34 sergi0: stat1008: Add `sowiki`, `stwiki`, `tgwiki` and `ugwiki` to `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T340944)
  • 13:33 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host backup2001.codfw.wmnet
  • 13:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dbprov2001.codfw.wmnet
  • 13:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ms-be2050.codfw.wmnet
  • 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ms-be2050.codfw.wmnet
  • 13:21 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host dbprov2001.codfw.wmnet
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1001.eqiad.wmnet
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 13:10 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host backup1001.eqiad.wmnet
  • 13:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1133.eqiad.wmnet
  • 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1014.eqiad.wmnet
  • 13:02 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host db1133.eqiad.wmnet
  • 13:00 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1141.eqiad.wmnet onto db1241.eqiad.wmnet
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'cloning db1141 - T350458', diff saved to https://phabricator.wikimedia.org/P53516 and previous config saved to /var/cache/conftool/dbconfig/20231116-125649-arnaudb.json
  • 12:56 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 12:55 arnaudb@cumin1001: dbctl commit (dc=all): 'cloning db1141 - T350458', diff saved to https://phabricator.wikimedia.org/P53515 and previous config saved to /var/cache/conftool/dbconfig/20231116-125515-arnaudb.json
  • 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 12:33 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cumin2002.codfw.wmnet
  • 12:33 marostegui: Install Test MariaDB 10.6.16 (Bookworm) on pc2014 T351283
  • 12:29 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 12:29 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 12:29 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1124.eqiad.wmnet
  • 12:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 12:23 jmm@cumin1001: START - Cookbook sre.puppet.migrate-host for host cumin2002.codfw.wmnet
  • 12:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host db1124.eqiad.wmnet
  • 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ms-fe1014.eqiad.wmnet
  • 11:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host clouddb1021.eqiad.wmnet
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 11:49 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host clouddb1021.eqiad.wmnet
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::serviceops
  • 11:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53514 and previous config saved to /var/cache/conftool/dbconfig/20231116-114511-arnaudb.json
  • 11:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T348183)', diff saved to https://phabricator.wikimedia.org/P53513 and previous config saved to /var/cache/conftool/dbconfig/20231116-114450-arnaudb.json
  • 11:34 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 11:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1004.eqiad.wmnet
  • 11:34 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::serviceops
  • 11:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P53512 and previous config saved to /var/cache/conftool/dbconfig/20231116-112942-arnaudb.json
  • 09:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53502 and previous config saved to /var/cache/conftool/dbconfig/20231116-094005-arnaudb.json
  • 09:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53501 and previous config saved to /var/cache/conftool/dbconfig/20231116-092500-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53500 and previous config saved to /var/cache/conftool/dbconfig/20231116-090955-arnaudb.json
  • 09:00 godog: bounce prometheus instances on prometheus2006 to test p7 upgrade
  • 08:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::worker
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: thanos::frontend
  • 08:37 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:37 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:34 moritzm: installing ruby-rails-html-sanitizer security updates
  • 08:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: thanos::frontend
  • 08:25 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host clouddumps1001.wikimedia.org
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host prometheus2006.codfw.wmnet
  • 08:19 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host clouddumps1001.wikimedia.org
  • 08:18 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 08:17 moritzm: installing elfutils security updates
  • 08:12 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudcumin2001.codfw.wmnet
  • 08:09 moritzm: installing python-git security updates
  • 08:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host prometheus2006.codfw.wmnet
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ncredir4001.ulsfo.wmnet
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: prometheus::pop
  • 07:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: prometheus::pop
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 06:07 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2004.codfw.wmnet with OS bullseye
  • 05:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 05:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T348183)', diff saved to https://phabricator.wikimedia.org/P53499 and previous config saved to /var/cache/conftool/dbconfig/20231116-053616-arnaudb.json
  • 05:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53498 and previous config saved to /var/cache/conftool/dbconfig/20231116-053554-arnaudb.json
  • 05:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P53497 and previous config saved to /var/cache/conftool/dbconfig/20231116-052048-arnaudb.json
  • 05:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P53496 and previous config saved to /var/cache/conftool/dbconfig/20231116-050542-arnaudb.json
  • 04:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 04:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53495 and previous config saved to /var/cache/conftool/dbconfig/20231116-045035-arnaudb.json
  • 04:38 cstone: payments-wiki upgraded from 6affb60a to eae2f35e
  • 04:30 cstone: payments-wiki upgraded from 084370bb to 6affb60a
  • 04:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 03:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 03:40 ejegg: fundraising civicrm upgraded from 6e53198c to 32679ea3
  • 03:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 01:53 cstone: payments-wiki upgraded from b4465e23 to 084370bb
  • 01:34 eileen: civicrm upgraded from ec6992e0 to 6e53198c
  • 00:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye

2023-11-15

  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53494 and previous config saved to /var/cache/conftool/dbconfig/20231115-235044-arnaudb.json
  • 23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53493 and previous config saved to /var/cache/conftool/dbconfig/20231115-235023-arnaudb.json
  • 23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P53492 and previous config saved to /var/cache/conftool/dbconfig/20231115-233516-arnaudb.json
  • 23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P53491 and previous config saved to /var/cache/conftool/dbconfig/20231115-232010-arnaudb.json
  • 23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53490 and previous config saved to /var/cache/conftool/dbconfig/20231115-230504-arnaudb.json
  • 23:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 22:59 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudelastic1007.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:58 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for cloudelastic1007.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:57 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudelastic1008.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:57 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for cloudelastic1008.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudelastic[1007-1010].wikimedia.org with reason: new cloudelastic hosts TT351354
  • 22:41 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudelastic[1007-1010].wikimedia.org with reason: new cloudelastic hosts TT351354
  • 22:20 ryankemper: T351354 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/974693; running puppet on hosts
  • 19:39 topranks: re-enabling puppet on DNS hosts to adjust TTL setting in BIRD (T350488)
  • 19:37 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:34 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 19:23 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.5 refs T350081 (duration: 05m 52s)
  • 19:17 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.5 refs T350081
  • 19:15 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aphlict
  • 19:10 topranks: merging patch to remove TTL restriction on Bird Anycast BGP peerings (T350488)
  • 19:09 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aphlict
  • 19:07 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 19:07 mutante: aphlict2001 - restart aphlict service after puppet 7 upgrade
  • 19:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::virt_ceph
  • 19:01 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudlb2001-dev.codfw.wmnet
  • 19:00 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 18:59 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::services
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aphlict2001.codfw.wmnet
  • 18:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:58 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:56 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 18:54 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:54 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::net
  • 18:54 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host aphlict2001.codfw.wmnet
  • 18:54 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudgw2003-dev.codfw.wmnet
  • 18:51 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::services
  • 18:49 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 18:45 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::net
  • 18:42 topranks: Reset BGP to lvs4010 from cr3-ulsfo to validate new config T350488
  • 18:41 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudgw2002-dev.codfw.wmnet
  • 18:36 topranks: remove TTL setting on server-facing BGP peerings on cr3-ulsfo T350488
  • 18:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::db
  • 18:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 18:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 18:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::db
  • 18:12 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53488 and previous config saved to /var/cache/conftool/dbconfig/20231115-180503-arnaudb.json
  • 18:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:01 jynus: All restart_daemons were successful
  • 18:01 root@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 17:57 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:56 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:56 root@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 17:52 inflatador: bking@wdqs1024 reboot host to hopefully reduce data reload failures T349011
  • 17:51 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 17:27 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 17:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:23 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:18 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 16:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1102.eqiad.wmnet
  • 16:52 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1102.eqiad.wmnet
  • 16:45 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1102.eqiad.wmnet
  • 16:36 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1102.eqiad.wmnet
  • 16:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1102.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:25 fabfur@cumin1001: START - Cookbook sre.hosts.provision for host cp1102.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:25 elukey: reload thanos-rule on titan[12]001 to pick up new pyrra generated configs
  • 16:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp1102.eqiad.wmnet with reason: BIOS settings fix
  • 16:21 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cp1102.eqiad.wmnet with reason: BIOS settings fix
  • 16:19 fabfur: depooling cp1102 for BIOS options fix
  • 16:16 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1130', diff saved to https://phabricator.wikimedia.org/P53486 and previous config saved to /var/cache/conftool/dbconfig/20231115-161600-arnaudb.json
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs6003.drmrs.wmnet
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1130.eqiad.wmnet
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1130.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:57 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1130.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:56 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs6003.drmrs.wmnet
  • 15:55 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1130.eqiad.wmnet
  • 15:48 fabfur: swapped cp1107 <-> cp1082 (T349244)
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host doh6001.wikimedia.org
  • 15:46 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1107.eqiad.wmnet
  • 15:46 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1107.eqiad.wmnet
  • 15:44 fabfur: swapped cp1106 <-> cp1081 (T349244)
  • 15:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1106.eqiad.wmnet
  • 15:43 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1106.eqiad.wmnet
  • 15:41 godog: bounce prometheus-blackbox-exporter on prometheus4002
  • 15:40 godog: bounce prometheus@ops on prometheus4002
  • 15:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host doh6001.wikimedia.org
  • 15:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host durum1001.eqiad.wmnet
  • 15:28 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1127', diff saved to https://phabricator.wikimedia.org/P53485 and previous config saved to /var/cache/conftool/dbconfig/20231115-152836-arnaudb.json
  • 15:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 15:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host durum1001.eqiad.wmnet
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1024.eqiad.wmnet
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1127.eqiad.wmnet
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1127.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:21 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1127.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:19 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 15:13 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1127.eqiad.wmnet
  • 15:12 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1024.eqiad.wmnet
  • 15:09 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 15:08 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: vrts
  • 15:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 15:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: vrts
  • 14:50 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 14:47 awight@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143) (duration: 08m 16s)
  • 14:47 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bullseye
  • 14:45 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::control
  • 14:42 awight@deploy2002: sgimeno and awight: Continuing with sync
  • 14:41 awight@deploy2002: sgimeno and awight: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:39 awight@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143)
  • 14:37 awight@deploy2002: Finished scap: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607) (duration: 16m 09s)
  • 14:35 claime: Raised mw-on-k8s to 20% of external traffic, rollout will happen over the next half hour - T348122
  • 14:31 awight@deploy2002: daimona and awight: Continuing with sync
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 14:30 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 14:26 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d] (hadoop-test): Regular analytics weekly train - TEST - HOTFIX [analytics/refinery@3e9df5d8] (duration: 03m 13s)
  • 14:23 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d] (hadoop-test): Regular analytics weekly train - TEST - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kubernetes2054.codfw.wmnet
  • 14:23 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
  • 14:23 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d] (thin): Regular analytics weekly train - THIN - HOTFIX [analytics/refinery@3e9df5d8] (duration: 00m 07s)
  • 14:23 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d] (thin): Regular analytics weekly train - THIN - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:22 awight@deploy2002: daimona and awight: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 awight@deploy2002: Started scap: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607)
  • 14:20 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
  • 14:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host kubernetes2054.codfw.wmnet
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 14:08 sukhe: running authdns-update to depool esams
  • 14:03 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye
  • 14:03 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8] (duration: 00m 06s)
  • 14:03 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:03 XioNoX: reboot fpc0 on cr1-esams - T346779
  • 14:00 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 13:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host thanos-be2001.codfw.wmnet
  • 13:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::control
  • 13:55 XioNoX: disable peering/transit on cr1-esams for linecard reboot - T346779
  • 13:52 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8] (duration: 08m 16s)
  • 13:50 taavi: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/973769/ core routers
  • 13:44 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8]
  • 13:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:40 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::kubernetes
  • 13:38 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 sfaci@deploy2002: Finished deploy [airflow-dags/analytics_test@5a47584]: Regular analytics weekly train [airflow/analytics_test@5a475842] (duration: 00m 14s)
  • 13:31 sfaci@deploy2002: Started deploy [airflow-dags/analytics_test@5a47584]: Regular analytics weekly train [airflow/analytics_test@5a475842]
  • 13:29 sfaci@deploy2002: Finished deploy [airflow-dags/analytics@5a47584]: Regular analytics weekly train [airflow/analytics@5a475842] (duration: 00m 27s)
  • 13:29 sfaci@deploy2002: Started deploy [airflow-dags/analytics@5a47584]: Regular analytics weekly train [airflow/analytics@5a475842]
  • 13:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::kubernetes
  • 13:22 sfaci@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train [airflow/analytics_test@c203642a] (duration: 00m 06s)
  • 13:21 sfaci@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train [airflow/analytics_test@c203642a]
  • 13:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:17 topranks: resetting FPC1 card in cr1-esams which has a major error and gone offline (T351304)
  • 13:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bullseye
  • 13:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 13:10 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 13:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:56 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 12:56 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 12:55 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:54 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:54 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:52 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 12:49 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 12:33 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 11:57 stevemunene@deploy2002: Finished deploy [airflow-dags/wmde@91810bc]: (no justification provided) (duration: 00m 10s)
  • 11:56 stevemunene@deploy2002: Started deploy [airflow-dags/wmde@91810bc]: (no justification provided)
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::unowned
  • 11:48 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::unowned
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 11:24 taavi: update cr*-{codfw,eqiad} firewall policy via homer to update cloudcontrol1006 addressing
  • 11:24 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:21 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:20 btullis@cumin1001: END (ERROR) - Cookbook sre.druid.roll-restart-workers (exit_code=97) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:18 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:17 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host thanos-fe2001.codfw.wmnet
  • 11:14 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: miscweb
  • 10:44 tchanders@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:42 tchanders@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:41 tchanders@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:40 tchanders@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:39 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 10:39 _joe_: roll restart of mobileapps in codfw and eqiad
  • 10:34 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:31 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:30 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: miscweb
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 09:37 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup_noferm
  • 09:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup_noferm
  • 09:09 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 08:37 moritzm: rolling restart of Cassandra in cassandra-dev following migration to Puppet 7
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cassandra_dev
  • 08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cassandra_dev
  • 08:01 marostegui@deploy2002: Finished scap: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""" (duration: 06m 54s)
  • 08:00 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1127', diff saved to https://phabricator.wikimedia.org/P53483 and previous config saved to /var/cache/conftool/dbconfig/20231115-080033-arnaudb.json
  • 07:55 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:55 marostegui@deploy2002: marostegui: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:54 marostegui@deploy2002: Started scap: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"""
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bookworm
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: pybaltest
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2013.codfw.wmnet with reason: host reimage
  • 07:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: pybaltest
  • 07:34 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: mariadb::misc::analytics::backup
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bookworm
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Reimage
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Reimage
  • 07:15 marostegui@deploy2002: Finished scap: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"" (duration: 06m 53s)
  • 07:10 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:10 marostegui@deploy2002: marostegui: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40934
  • 07:08 marostegui@deploy2002: Started scap: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""
  • 07:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40934
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 983
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 983
  • 01:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53482 and previous config saved to /var/cache/conftool/dbconfig/20231115-000545-arnaudb.json

2023-11-14

  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P53481 and previous config saved to /var/cache/conftool/dbconfig/20231114-235039-arnaudb.json
  • 23:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P53480 and previous config saved to /var/cache/conftool/dbconfig/20231114-233532-arnaudb.json
  • 23:26 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53479 and previous config saved to /var/cache/conftool/dbconfig/20231114-232026-arnaudb.json
  • 22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53478 and previous config saved to /var/cache/conftool/dbconfig/20231114-225258-arnaudb.json
  • 22:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53477 and previous config saved to /var/cache/conftool/dbconfig/20231114-225236-arnaudb.json
  • 22:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P53476 and previous config saved to /var/cache/conftool/dbconfig/20231114-223730-arnaudb.json
  • 22:33 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P53474 and previous config saved to /var/cache/conftool/dbconfig/20231114-222224-arnaudb.json
  • 22:19 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:07 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53473 and previous config saved to /var/cache/conftool/dbconfig/20231114-220717-arnaudb.json
  • 22:05 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53472 and previous config saved to /var/cache/conftool/dbconfig/20231114-220241-arnaudb.json
  • 22:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 22:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53471 and previous config saved to /var/cache/conftool/dbconfig/20231114-220220-arnaudb.json
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 21:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 21:48 urbanecm@deploy2002: Finished scap: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions (duration: 07m 36s)
  • 21:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P53470 and previous config saved to /var/cache/conftool/dbconfig/20231114-214713-arnaudb.json
  • 21:42 urbanecm@deploy2002: urbanecm and jdrewniak and ebernhardson: Continuing with sync
  • 21:42 urbanecm@deploy2002: urbanecm and jdrewniak and ebernhardson: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:40 urbanecm@deploy2002: Started scap: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions
  • 21:39 urbanecm@deploy2002: Finished scap: Backport for [Zebra] Remove underline from pages with blank title (T351119) (duration: 09m 59s)
  • 21:34 urbanecm@deploy2002: urbanecm and jdrewniak: Continuing with sync
  • 21:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 21:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P53469 and previous config saved to /var/cache/conftool/dbconfig/20231114-213207-arnaudb.json
  • 21:31 urbanecm@deploy2002: urbanecm and jdrewniak: Backport for [Zebra] Remove underline from pages with blank title (T351119) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 21:29 urbanecm@deploy2002: Started scap: Backport for [Zebra] Remove underline from pages with blank title (T351119)
  • 21:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 21:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 21:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 21:21 urbanecm@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002) (duration: 06m 49s)
  • 21:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53468 and previous config saved to /var/cache/conftool/dbconfig/20231114-211700-arnaudb.json
  • 21:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 21:16 urbanecm@deploy2002: dani and urbanecm and zoranzoki21: Continuing with sync
  • 21:15 urbanecm@deploy2002: dani and urbanecm and zoranzoki21: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:14 urbanecm@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002)
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53467 and previous config saved to /var/cache/conftool/dbconfig/20231114-211231-arnaudb.json
  • 21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53466 and previous config saved to /var/cache/conftool/dbconfig/20231114-211209-arnaudb.json
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P53465 and previous config saved to /var/cache/conftool/dbconfig/20231114-205703-arnaudb.json
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts phab-test1001.eqiad.wmnet
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab-test1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:49 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab-test1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 20:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 20:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts phab-test1001.eqiad.wmnet
  • 20:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:42 mutante: destroying phab-test1001.eqiad.wmnet - T351115
  • 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P53464 and previous config saved to /var/cache/conftool/dbconfig/20231114-204156-arnaudb.json
  • 20:41 mutante: doc2002 - systemctl start rsync-doc-host-data-sync - failed unit after maintenance reboot
  • 20:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:33 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:32 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doc1003.eqiad.wmnet with reason: maintenance
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: maintenance
  • 20:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 20:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doc2002.codfw.wmnet with reason: maintenance
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on doc2002.codfw.wmnet with reason: maintenance
  • 20:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 20:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53463 and previous config saved to /var/cache/conftool/dbconfig/20231114-202650-arnaudb.json
  • 20:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people2003.codfw.wmnet with reason: maintenance
  • 20:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 20:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people2003.codfw.wmnet with reason: maintenance
  • 20:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people1004.eqiad.wmnet with reason: maintenance
  • 20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people1004.eqiad.wmnet with reason: maintenance
  • 20:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53462 and previous config saved to /var/cache/conftool/dbconfig/20231114-202232-arnaudb.json
  • 20:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53461 and previous config saved to /var/cache/conftool/dbconfig/20231114-202154-arnaudb.json
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: doc
  • 20:21 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: doc
  • 20:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 20:09 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host doc2002.codfw.wmnet
  • 20:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P53460 and previous config saved to /var/cache/conftool/dbconfig/20231114-200648-arnaudb.json
  • 20:04 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 20:03 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 20:02 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host doc2002.codfw.wmnet
  • 20:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 19:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 19:57 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etherpad
  • 19:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 19:52 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etherpad
  • 19:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P53459 and previous config saved to /var/cache/conftool/dbconfig/20231114-195141-arnaudb.json
  • 19:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 19:39 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f94afe0] (duration: 03m 14s)
  • 19:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53458 and previous config saved to /var/cache/conftool/dbconfig/20231114-193635-arnaudb.json
  • 19:36 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f94afe0]
  • 19:35 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe] (thin): Regular analytics weekly train THIN [analytics/refinery@2f94afe0] (duration: 00m 06s)
  • 19:35 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe] (thin): Regular analytics weekly train THIN [analytics/refinery@2f94afe0]
  • 19:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 19:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53457 and previous config saved to /var/cache/conftool/dbconfig/20231114-193217-arnaudb.json
  • 19:32 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe]: Regular analytics weekly train [analytics/refinery@2f94afe0] (duration: 07m 04s)
  • 19:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53456 and previous config saved to /var/cache/conftool/dbconfig/20231114-193156-arnaudb.json
  • 19:25 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe]: Regular analytics weekly train [analytics/refinery@2f94afe0]
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on moscovium.eqiad.wmnet with reason: maintenance
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on moscovium.eqiad.wmnet with reason: maintenance
  • 19:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 19:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P53455 and previous config saved to /var/cache/conftool/dbconfig/20231114-191649-arnaudb.json
  • 19:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1011.eqiad.wmnet with OS bullseye
  • 19:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 19:14 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.5 refs T350081
  • 19:13 ejegg: fundraising civicrm upgraded from 88361167 to ec6992e0
  • 19:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 19:04 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: stewards
  • 19:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P53454 and previous config saved to /var/cache/conftool/dbconfig/20231114-190143-arnaudb.json
  • 18:58 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: stewards
  • 18:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 18:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 18:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: host reimage
  • 18:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 18:50 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: host reimage
  • 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53453 and previous config saved to /var/cache/conftool/dbconfig/20231114-184637-arnaudb.json
  • 18:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53452 and previous config saved to /var/cache/conftool/dbconfig/20231114-184204-arnaudb.json
  • 18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53451 and previous config saved to /var/cache/conftool/dbconfig/20231114-184142-arnaudb.json
  • 18:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1011.eqiad.wmnet with OS bullseye
  • 18:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 18:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 18:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 18:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P53450 and previous config saved to /var/cache/conftool/dbconfig/20231114-182636-arnaudb.json
  • 18:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 18:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 18:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P53449 and previous config saved to /var/cache/conftool/dbconfig/20231114-181130-arnaudb.json
  • 18:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 18:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 17:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53448 and previous config saved to /var/cache/conftool/dbconfig/20231114-175623-arnaudb.json
  • 17:55 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:54 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::control
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53447 and previous config saved to /var/cache/conftool/dbconfig/20231114-175202-arnaudb.json
  • 17:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53446 and previous config saved to /var/cache/conftool/dbconfig/20231114-175140-arnaudb.json
  • 17:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:43 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::control
  • 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P53445 and previous config saved to /var/cache/conftool/dbconfig/20231114-173634-arnaudb.json
  • 17:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 17:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P53444 and previous config saved to /var/cache/conftool/dbconfig/20231114-172127-arnaudb.json
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53442 and previous config saved to /var/cache/conftool/dbconfig/20231114-170621-arnaudb.json
  • 17:03 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 17:02 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53441 and previous config saved to /var/cache/conftool/dbconfig/20231114-170158-arnaudb.json
  • 17:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53440 and previous config saved to /var/cache/conftool/dbconfig/20231114-170136-arnaudb.json
  • 16:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@0ae1184]: make cirrus index imports world readable in hdfs (duration: 00m 28s)
  • 16:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@0ae1184]: make cirrus index imports world readable in hdfs
  • 16:47 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:47 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P53438 and previous config saved to /var/cache/conftool/dbconfig/20231114-164630-arnaudb.json
  • 16:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 16:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 16:35 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@017fbf1]: search: clean wcqs revision map (duration: 00m 29s)
  • 16:34 ebernhardson@deploy2002: Started deploy [airflow-dags/search@017fbf1]: search: clean wcqs revision map
  • 16:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P53437 and previous config saved to /var/cache/conftool/dbconfig/20231114-163123-arnaudb.json
  • 16:30 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1002.eqiad.wmnet
  • 16:26 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1002.eqiad.wmnet
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 16:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53436 and previous config saved to /var/cache/conftool/dbconfig/20231114-161617-arnaudb.json
  • 16:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 16:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 16:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53435 and previous config saved to /var/cache/conftool/dbconfig/20231114-161157-arnaudb.json
  • 16:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::serviceops_collab
  • 16:11 brennen@deploy2002: Finished deploy [phabricator/deployment@0b76984]: deploy to phab1004 for T350876 (duration: 01m 04s)
  • 16:09 brennen@deploy2002: Started deploy [phabricator/deployment@0b76984]: deploy to phab1004 for T350876
  • 16:09 brennen@deploy2002: Finished deploy [phabricator/deployment@0b76984]: test deploy to phab2002 for T350876 (duration: 00m 32s)
  • 16:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:08 brennen@deploy2002: Started deploy [phabricator/deployment@0b76984]: test deploy to phab2002 for T350876
  • 16:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 16:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 16:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T348183)', diff saved to https://phabricator.wikimedia.org/P53434 and previous config saved to /var/cache/conftool/dbconfig/20231114-160356-arnaudb.json
  • 16:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::serviceops_collab
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:53 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:53 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:50 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host vrts1002.eqiad.wmnet
  • 15:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P53433 and previous config saved to /var/cache/conftool/dbconfig/20231114-154850-arnaudb.json
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:47 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:46 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1046']
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1046']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1046']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host vrts1002.eqiad.wmnet
  • 15:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53432 and previous config saved to /var/cache/conftool/dbconfig/20231114-153355-arnaudb.json
  • 15:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P53431 and previous config saved to /var/cache/conftool/dbconfig/20231114-153344-arnaudb.json
  • 15:32 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:32 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:29 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:29 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:23 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::analytics_replica
  • 15:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1046']
  • 15:21 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 15:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 90%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53430 and previous config saved to /var/cache/conftool/dbconfig/20231114-151850-arnaudb.json
  • 15:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1236 (T348183)', diff saved to https://phabricator.wikimedia.org/P53428 and previous config saved to /var/cache/conftool/dbconfig/20231114-151602-arnaudb.json
  • 15:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53427 and previous config saved to /var/cache/conftool/dbconfig/20231114-151541-arnaudb.json
  • 15:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::analytics_replica
  • 15:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53426 and previous config saved to /var/cache/conftool/dbconfig/20231114-150345-arnaudb.json
  • 15:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P53425 and previous config saved to /var/cache/conftool/dbconfig/20231114-150034-arnaudb.json
  • 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::backups
  • 14:53 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:52 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:52 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:51 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:50 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 14:50 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::backups
  • 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53423 and previous config saved to /var/cache/conftool/dbconfig/20231114-144840-arnaudb.json
  • 14:46 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:46 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P53421 and previous config saved to /var/cache/conftool/dbconfig/20231114-144528-arnaudb.json
  • 14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:44 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:42 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bullseye
  • 14:38 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:38 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 45%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53420 and previous config saved to /var/cache/conftool/dbconfig/20231114-143335-arnaudb.json
  • 14:32 fabfur: swapped cp1105 <-> cp1080 (T349244)
  • 14:32 urbanecm@deploy2002: Finished scap: Backport for IP Masking: Expire temporary accounts in 1 year (T344695), TempUser: Fix unchecked array access for optional key, IP Masking: Add expireTemporaryAccounts.php (T344695) (duration: 07m 03s)
  • 14:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1105.eqiad.wmnet
  • 14:31 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1105.eqiad.wmnet
  • 14:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53418 and previous config saved to /var/cache/conftool/dbconfig/20231114-143021-arnaudb.json
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:28 fabfur: swapped cp1104 <-> cp1079 (T349244)
  • 14:27 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 14:26 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:26 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1104.eqiad.wmnet
  • 14:26 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1104.eqiad.wmnet
  • 14:26 urbanecm@deploy2002: urbanecm: Backport for IP Masking: Expire temporary accounts in 1 year (T344695), TempUser: Fix unchecked array access for optional key, IP Masking: Add expireTemporaryAccounts.php (T344695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53417 and previous config saved to /var/cache/conftool/dbconfig/20231114-142608-arnaudb.json
  • 14:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:25 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53416 and previous config saved to /var/cache/conftool/dbconfig/20231114-142547-arnaudb.json
  • 14:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage
  • 14:22 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 14:20 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 14:20 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 30%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53415 and previous config saved to /var/cache/conftool/dbconfig/20231114-141830-arnaudb.json
  • 14:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P53414 and previous config saved to /var/cache/conftool/dbconfig/20231114-141041-arnaudb.json
  • 14:04 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bullseye
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 15%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53413 and previous config saved to /var/cache/conftool/dbconfig/20231114-140325-arnaudb.json
  • 13:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P53412 and previous config saved to /var/cache/conftool/dbconfig/20231114-135534-arnaudb.json
  • 13:48 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_cache::storage
  • 13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1138.eqiad.wmnet onto db1238.eqiad.wmnet
  • 13:43 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 13:43 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 13:42 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 13:42 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 13:41 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_cache::storage
  • 13:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53411 and previous config saved to /var/cache/conftool/dbconfig/20231114-134028-arnaudb.json
  • 13:38 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53410 and previous config saved to /var/cache/conftool/dbconfig/20231114-133755-arnaudb.json
  • 13:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53409 and previous config saved to /var/cache/conftool/dbconfig/20231114-133734-arnaudb.json
  • 13:34 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-cache1003.eqiad.wmnet
  • 13:30 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcontrol2005-dev.codfw.wmnet
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: releases
  • 13:26 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P53408 and previous config saved to /var/cache/conftool/dbconfig/20231114-132227-arnaudb.json
  • 13:20 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-cache2003.codfw.wmnet
  • 13:20 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 13:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 13:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: releases
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::mariadb
  • 13:10 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P53407 and previous config saved to /var/cache/conftool/dbconfig/20231114-130721-arnaudb.json
  • 13:06 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 13:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 13:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::mariadb
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::misc::analytics::backup
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53406 and previous config saved to /var/cache/conftool/dbconfig/20231114-125214-arnaudb.json
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53405 and previous config saved to /var/cache/conftool/dbconfig/20231114-124942-arnaudb.json
  • 12:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53404 and previous config saved to /var/cache/conftool/dbconfig/20231114-124921-arnaudb.json
  • 12:48 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981 (duration: 00m 07s)
  • 12:48 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981
  • 12:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981 (duration: 00m 04s)
  • 12:46 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981
  • 12:42 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::misc::analytics::backup
  • 12:37 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:37 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P53403 and previous config saved to /var/cache/conftool/dbconfig/20231114-123414-arnaudb.json
  • 12:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::misc::analytics::backup
  • 12:20 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
  • 12:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P53402 and previous config saved to /var/cache/conftool/dbconfig/20231114-121908-arnaudb.json
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gitlab
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 12:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 12:06 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gitlab
  • 12:06 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 12:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 12:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53401 and previous config saved to /var/cache/conftool/dbconfig/20231114-120401-arnaudb.json
  • 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53400 and previous config saved to /var/cache/conftool/dbconfig/20231114-120129-arnaudb.json
  • 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::presto::server
  • 12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53399 and previous config saved to /var/cache/conftool/dbconfig/20231114-120108-arnaudb.json
  • 11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P53398 and previous config saved to /var/cache/conftool/dbconfig/20231114-114602-arnaudb.json
  • 11:45 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::presto::server
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gitlab1003.wikimedia.org
  • 11:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P53397 and previous config saved to /var/cache/conftool/dbconfig/20231114-113055-arnaudb.json
  • 11:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gitlab1003.wikimedia.org
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-presto1001.eqiad.wmnet
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53396 and previous config saved to /var/cache/conftool/dbconfig/20231114-111549-arnaudb.json
  • 11:15 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::staging::worker
  • 11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53395 and previous config saved to /var/cache/conftool/dbconfig/20231114-111316-arnaudb.json
  • 11:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53394 and previous config saved to /var/cache/conftool/dbconfig/20231114-111037-arnaudb.json
  • 11:09 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::staging::worker
  • 10:58 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-presto1001.eqiad.wmnet
  • 10:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P53393 and previous config saved to /var/cache/conftool/dbconfig/20231114-105530-arnaudb.json
  • 10:55 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1138.eqiad.wmnet onto db1238.eqiad.wmnet
  • 10:50 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging2001.codfw.wmnet
  • 10:48 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::staging::master
  • 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'migrate db1138 to db1238 - T344036', diff saved to https://phabricator.wikimedia.org/P53392 and previous config saved to /var/cache/conftool/dbconfig/20231114-104603-arnaudb.json
  • 10:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P53391 and previous config saved to /var/cache/conftool/dbconfig/20231114-104024-arnaudb.json
  • 10:40 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::staging::master
  • 10:39 arnaudb@cumin1001: dbctl commit (dc=all): 'T351184 - weight mirror', diff saved to https://phabricator.wikimedia.org/P53390 and previous config saved to /var/cache/conftool/dbconfig/20231114-103941-arnaudb.json
  • 10:38 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary T351184', diff saved to https://phabricator.wikimedia.org/P53389 and previous config saved to /var/cache/conftool/dbconfig/20231114-103601-arnaudb.json
  • 10:34 arnaudb: Starting s4 eqiad failover from db1138 to db1160 - T351184
  • 10:33 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging-ctrl2002.codfw.wmnet
  • 10:26 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging-ctrl2002.codfw.wmnet
  • 10:26 jnuche@deploy2002: Pruned MediaWiki: 1.42.0-wmf.3 (duration: 02m 06s)
  • 10:25 moritzm: imported 5.1.19+4.0.11-3+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53388 and previous config saved to /var/cache/conftool/dbconfig/20231114-102517-arnaudb.json
  • 10:24 jnuche@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.5 refs T350081 (duration: 20m 19s)
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53387 and previous config saved to /var/cache/conftool/dbconfig/20231114-102206-arnaudb.json
  • 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53386 and previous config saved to /var/cache/conftool/dbconfig/20231114-102145-arnaudb.json
  • 10:15 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::ml_etcd::staging
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T351184', diff saved to https://phabricator.wikimedia.org/P53385 and previous config saved to /var/cache/conftool/dbconfig/20231114-100843-arnaudb.json
  • 10:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T351184
  • 10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T351184
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P53384 and previous config saved to /var/cache/conftool/dbconfig/20231114-100638-arnaudb.json
  • 10:05 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::ml_etcd::staging
  • 10:03 jnuche@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.5 refs T350081
  • 09:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 09:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master" (duration: 07m 26s)
  • 09:52 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P53383 and previous config saved to /var/cache/conftool/dbconfig/20231114-095132-arnaudb.json
  • 09:47 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:47 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:45 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master"
  • 09:45 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet
  • 09:39 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc3 master (duration: 07m 11s)
  • 09:38 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:38 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:36 jayme: reimaging kubestage2002 to verify with puppet7
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53380 and previous config saved to /var/cache/conftool/dbconfig/20231114-093625-arnaudb.json
  • 09:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
  • 09:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53379 and previous config saved to /var/cache/conftool/dbconfig/20231114-093353-arnaudb.json
  • 09:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc3 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc3 master
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging-etcd2003.codfw.wmnet
  • 09:26 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 07m 02s)
  • 09:26 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:21 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:20 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:19 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 09:12 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 07m 24s)
  • 09:07 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:06 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:05 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 09:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 08:56 godog: add 80g to prometheus/k8s-ml-serve in eqiad
  • 08:56 godog: add 80g to prometheus/ops in eqiad
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1164.eqiad.wmnet with OS bookworm
  • 08:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1164.eqiad.wmnet with OS bookworm
  • 08:06 moritzm: installing nghttp2 security updates
  • 08:04 marostegui: Failover m1 from db1164 to db1119 - T350022
  • 07:59 moritzm: installing dbus security updates on bullseye
  • 07:39 jynus: stop bacula dir (and puppet) at backup1001 T350022
  • 07:27 vgutierrez: include golang-github-mmatczuk-anyflag_0.0~git20231026.5f42d2f in apt.wm.org (bookworm)
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 05:45 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 05:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 05:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:18 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:17 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 05:05 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 05:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 04:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.5 refs T350081 (duration: 51m 15s)
  • 04:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.5 refs T350081
  • 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 03:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 03:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 03:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 03:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 03:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 03:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 03:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 03:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 02:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 02:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 02:43 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 02:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 00:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 00:31 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 00:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards1001.eqiad.wmnet with OS bookworm
  • 00:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 00:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm

2023-11-13

  • 23:57 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 23:57 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1021.eqiad.wmnet
  • 23:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1021.eqiad.wmnet
  • 23:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 23:13 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:13 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:12 mutante: wmf-reimage for stewards1001 failed with [self-signed certificate in certificate chain
  • 23:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 23:10 tgr: UTC late deploys done
  • 22:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 22:55 tgr@deploy2002: Finished scap: Backport for session: Remove incorrect warning (T348852) (duration: 08m 03s)
  • 22:52 root@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 22:49 root@cumin2002: START - Cookbook sre.hosts.decommission for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 22:49 tgr@deploy2002: tgr: Continuing with sync
  • 22:48 tgr@deploy2002: tgr: Backport for session: Remove incorrect warning (T348852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:47 tgr@deploy2002: Started scap: Backport for session: Remove incorrect warning (T348852)
  • 22:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 22:41 tgr@deploy2002: Finished scap: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852) (duration: 18m 17s)
  • 22:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 22:35 tgr@deploy2002: tgr: Continuing with sync
  • 22:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 22:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 22:24 tgr@deploy2002: tgr: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 22:22 tgr@deploy2002: Started scap: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852)
  • 22:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 22:12 bvibber: brion halting requeueTranscode jobs to let queues even out before continuing with lighter load
  • 22:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 22:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 21:54 urbanecm@deploy2002: Finished scap: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113) (duration: 18m 34s)
  • 21:49 urbanecm@deploy2002: urbanecm and ssastry and tgr: Continuing with sync
  • 21:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 21:48 bking@deploy2002: Finished deploy [search/mjolnir/deploy@0f8bb60]: (no justification provided) (duration: 00m 35s)
  • 21:47 bking@deploy2002: Started deploy [search/mjolnir/deploy@0f8bb60]: (no justification provided)
  • 21:46 inflatador: bking@deploy2002 deploy mjolnir 2.4.0 on newly-built bullseye hosts T346039
  • 21:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 21:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53376 and previous config saved to /var/cache/conftool/dbconfig/20231113-214411-arnaudb.json
  • 21:40 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 21:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 21:37 urbanecm@deploy2002: urbanecm and ssastry and tgr: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:36 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 21:36 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 21:36 urbanecm@deploy2002: Started scap: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113)
  • 21:34 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 21:33 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 21:32 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:32 urbanecm@deploy2002: Finished scap: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996) (duration: 10m 37s)
  • 21:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 21:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P53375 and previous config saved to /var/cache/conftool/dbconfig/20231113-212904-arnaudb.json
  • 21:28 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:26 urbanecm@deploy2002: urbanecm and brion and dani: Continuing with sync
  • 21:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 21:22 urbanecm@deploy2002: urbanecm and brion and dani: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996) synced to the testservers (https://wikitech.wiki
  • 21:21 urbanecm@deploy2002: Started scap: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996)
  • 21:20 urbanecm@deploy2002: Sync cancelled.
  • 21:19 urbanecm@deploy2002: urbanecm and dani: Backport for Undeploy pilot survey on metawiki (T349854) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1030.eqiad.wmnet
  • 21:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 21:18 urbanecm@deploy2002: Started scap: Backport for Undeploy pilot survey on metawiki (T349854)
  • 21:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:16 urbanecm@deploy2002: Finished scap: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951) (duration: 10m 15s)
  • 21:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P53374 and previous config saved to /var/cache/conftool/dbconfig/20231113-211358-arnaudb.json
  • 21:11 urbanecm@deploy2002: dani and kemayo and urbanecm: Continuing with sync
  • 21:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1030.eqiad.wmnet
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1029.eqiad.wmnet
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:09 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:07 urbanecm@deploy2002: dani and kemayo and urbanecm: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 21:06 urbanecm@deploy2002: Started scap: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951)
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1029.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1028.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:00 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 20:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53373 and previous config saved to /var/cache/conftool/dbconfig/20231113-205852-arnaudb.json
  • 20:58 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:53 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1028.eqiad.wmnet
  • 20:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1023.eqiad.wmnet with reason: T347504
  • 20:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1023.eqiad.wmnet with reason: T347504
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1027.eqiad.wmnet
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T347504
  • 20:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T347504
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1022.eqiad.wmnet with reason: T347504
  • 20:51 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1022.eqiad.wmnet with reason: T347504
  • 20:51 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53372 and previous config saved to /var/cache/conftool/dbconfig/20231113-205032-arnaudb.json
  • 20:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53371 and previous config saved to /var/cache/conftool/dbconfig/20231113-205010-arnaudb.json
  • 20:49 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 urbanecm: mwmaint2002: `mwscript extensions/GrowthExperiments/maintenance/reassignMentees.php --wiki=arwiki --all --performer='Martin Urbanec (WMF)'` (T330071)
  • 20:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1027.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1026.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:43 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 20:42 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:36 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 20:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1026.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1025.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P53370 and previous config saved to /var/cache/conftool/dbconfig/20231113-203504-arnaudb.json
  • 20:34 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1025.eqiad.wmnet
  • 20:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P53369 and previous config saved to /var/cache/conftool/dbconfig/20231113-201957-arnaudb.json
  • 20:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 20:15 ebernhardson: start reindex of enwiki indexes in cloudelastic search cluster from mwmaint2002
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53368 and previous config saved to /var/cache/conftool/dbconfig/20231113-200451-arnaudb.json
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53367 and previous config saved to /var/cache/conftool/dbconfig/20231113-195934-arnaudb.json
  • 19:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53366 and previous config saved to /var/cache/conftool/dbconfig/20231113-195913-arnaudb.json
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 19:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:55 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:48 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P53365 and previous config saved to /var/cache/conftool/dbconfig/20231113-194406-arnaudb.json
  • 19:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P53364 and previous config saved to /var/cache/conftool/dbconfig/20231113-192900-arnaudb.json
  • 19:25 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:17 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 19:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53363 and previous config saved to /var/cache/conftool/dbconfig/20231113-191354-arnaudb.json
  • 19:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53362 and previous config saved to /var/cache/conftool/dbconfig/20231113-190849-arnaudb.json
  • 19:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53361 and previous config saved to /var/cache/conftool/dbconfig/20231113-190827-arnaudb.json
  • 19:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 18:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P53360 and previous config saved to /var/cache/conftool/dbconfig/20231113-185321-arnaudb.json
  • 18:50 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:50 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:42 sukhe: pool cp4052 as first cp host for bookworm testing: T342154
  • 18:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P53359 and previous config saved to /var/cache/conftool/dbconfig/20231113-183814-arnaudb.json
  • 18:28 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: microsites::peopleweb
  • 18:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53358 and previous config saved to /var/cache/conftool/dbconfig/20231113-182308-arnaudb.json
  • 18:20 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: microsites::peopleweb
  • 18:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53357 and previous config saved to /var/cache/conftool/dbconfig/20231113-181751-arnaudb.json
  • 18:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53356 and previous config saved to /var/cache/conftool/dbconfig/20231113-181729-arnaudb.json
  • 18:16 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host people2003.codfw.wmnet
  • 18:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:08 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:06 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P53355 and previous config saved to /var/cache/conftool/dbconfig/20231113-180222-arnaudb.json
  • 17:59 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host people2003.codfw.wmnet
  • 17:59 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:59 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:59 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P53354 and previous config saved to /var/cache/conftool/dbconfig/20231113-174716-arnaudb.json
  • 17:36 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 17:35 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:33 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 17:33 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:32 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 17:32 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53353 and previous config saved to /var/cache/conftool/dbconfig/20231113-173209-arnaudb.json
  • 17:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 17:31 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:31 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:28 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53352 and previous config saved to /var/cache/conftool/dbconfig/20231113-172748-arnaudb.json
  • 17:27 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53351 and previous config saved to /var/cache/conftool/dbconfig/20231113-172712-arnaudb.json
  • 17:27 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:26 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:21 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:21 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:20 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:17 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:16 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:15 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:14 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 17:13 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P53350 and previous config saved to /var/cache/conftool/dbconfig/20231113-171205-arnaudb.json
  • 17:10 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:08 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:06 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:06 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:05 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:05 ottomata: deploying eventgates to pick up change to use mw-api-int-async-ro with retries - T326002
  • 17:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:04 otto@deploy2002: Finished deploy [analytics/refinery@25ef91f]: deploying refinery with refinery-source 0.2.25 jars for T321854 [analytics/refinery@25ef91f2] (duration: 06m 36s)
  • 16:57 otto@deploy2002: Started deploy [analytics/refinery@25ef91f]: deploying refinery with refinery-source 0.2.25 jars for T321854 [analytics/refinery@25ef91f2]
  • 16:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P53349 and previous config saved to /var/cache/conftool/dbconfig/20231113-165659-arnaudb.json
  • 16:51 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 42s)
  • 16:46 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
  • 16:43 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:43 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:43 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:42 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53348 and previous config saved to /var/cache/conftool/dbconfig/20231113-164152-arnaudb.json
  • 16:40 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 16:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 16:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53347 and previous config saved to /var/cache/conftool/dbconfig/20231113-163730-arnaudb.json
  • 16:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53346 and previous config saved to /var/cache/conftool/dbconfig/20231113-163709-arnaudb.json
  • 16:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P53345 and previous config saved to /var/cache/conftool/dbconfig/20231113-162202-arnaudb.json
  • 16:14 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:13 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:12 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 16:11 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:11 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:09 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P53344 and previous config saved to /var/cache/conftool/dbconfig/20231113-160656-arnaudb.json
  • 15:55 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 15:54 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:52 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53343 and previous config saved to /var/cache/conftool/dbconfig/20231113-155149-arnaudb.json
  • 15:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53342 and previous config saved to /var/cache/conftool/dbconfig/20231113-154641-arnaudb.json
  • 15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53341 and previous config saved to /var/cache/conftool/dbconfig/20231113-154044-arnaudb.json
  • 15:39 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:38 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:31 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:31 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P53340 and previous config saved to /var/cache/conftool/dbconfig/20231113-152537-arnaudb.json
  • 15:14 fabfur: swapped cp1103 <-> cp1078 (T349244)
  • 15:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
  • 15:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1103.eqiad.wmnet
  • 15:13 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1103.eqiad.wmnet
  • 15:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P53339 and previous config saved to /var/cache/conftool/dbconfig/20231113-151031-arnaudb.json
  • 15:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
  • 15:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
  • 15:07 fabfur: swapped cp1102 <-> cp1077 (T349244)
  • 15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1102.eqiad.wmnet
  • 15:04 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1102.eqiad.wmnet
  • 15:01 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:00 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
  • 14:59 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:59 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:58 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:58 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1018.eqiad.wmnet
  • 14:56 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:56 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53338 and previous config saved to /var/cache/conftool/dbconfig/20231113-145524-arnaudb.json
  • 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53337 and previous config saved to /var/cache/conftool/dbconfig/20231113-145223-arnaudb.json
  • 14:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 14:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 14:51 urbanecm: mwmaint2002: stop `extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki frwiki` again, memory leak didn't stop (T315510)
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53336 and previous config saved to /var/cache/conftool/dbconfig/20231113-144947-arnaudb.json
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
  • 14:43 urbanecm: mwmaint2002: foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php MediaModeration (T350321)
  • 14:41 bblack: cp2027: varnish-frontend-restart to test tcp listen port changes
  • 14:40 urbanecm@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321) (duration: 09m 13s)
  • 14:38 urbanecm: mwmaint2002: Start several instances of `extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php` (T315510)
  • 14:35 urbanecm@deploy2002: urbanecm and dani: Continuing with sync
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P53335 and previous config saved to /var/cache/conftool/dbconfig/20231113-143440-arnaudb.json
  • 14:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host clouddb1017.eqiad.wmnet
  • 14:32 urbanecm@deploy2002: urbanecm and dani: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:31 urbanecm@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321)
  • 14:30 urbanecm@deploy2002: Finished scap: Backport for ParserOutputAccess: Limit local cache size (T315510) (duration: 06m 42s)
  • 14:30 moritzm: installing debianutils bugfix updates from Bookworm point release
  • 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:24 urbanecm@deploy2002: Started scap: Backport for ParserOutputAccess: Limit local cache size (T315510)
  • 14:22 urbanecm@deploy2002: Finished scap: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321) (duration: 06m 58s)
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P53334 and previous config saved to /var/cache/conftool/dbconfig/20231113-141934-arnaudb.json
  • 14:16 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:16 urbanecm@deploy2002: urbanecm: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:15 urbanecm@deploy2002: Started scap: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321)
  • 14:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53333 and previous config saved to /var/cache/conftool/dbconfig/20231113-140427-arnaudb.json
  • 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53332 and previous config saved to /var/cache/conftool/dbconfig/20231113-140136-arnaudb.json
  • 14:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 14:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53331 and previous config saved to /var/cache/conftool/dbconfig/20231113-140115-arnaudb.json
  • 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: requesttracker
  • 13:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P53330 and previous config saved to /var/cache/conftool/dbconfig/20231113-134608-arnaudb.json
  • 13:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: requesttracker
  • 13:45 moritzm: restarting FPM/Apache on mw canaries
  • 13:42 moritzm: installing nghttp2 security updates
  • 13:38 moritzm: installing tomcat9 security updates
  • 13:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P53329 and previous config saved to /var/cache/conftool/dbconfig/20231113-133102-arnaudb.json
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::jumbo::broker
  • 13:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53328 and previous config saved to /var/cache/conftool/dbconfig/20231113-131556-arnaudb.json
  • 13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53327 and previous config saved to /var/cache/conftool/dbconfig/20231113-131207-arnaudb.json
  • 13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53326 and previous config saved to /var/cache/conftool/dbconfig/20231113-131146-arnaudb.json
  • 13:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::jumbo::broker
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P53325 and previous config saved to /var/cache/conftool/dbconfig/20231113-125640-arnaudb.json
  • 12:55 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:43 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 12:42 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2004.codfw.wmnet with OS bullseye
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P53324 and previous config saved to /var/cache/conftool/dbconfig/20231113-124133-arnaudb.json
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host centrallog2002.codfw.wmnet
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:34 effie: restarting memcached on mc2038
  • 12:32 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 12:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:31 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2003.codfw.wmnet
  • 12:29 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:28 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1004.eqiad.wmnet with OS bullseye
  • 12:28 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:28 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:27 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53323 and previous config saved to /var/cache/conftool/dbconfig/20231113-122627-arnaudb.json
  • 12:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53322 and previous config saved to /var/cache/conftool/dbconfig/20231113-122332-arnaudb.json
  • 12:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 12:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 12:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53321 and previous config saved to /var/cache/conftool/dbconfig/20231113-122310-arnaudb.json
  • 12:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host centrallog2002.codfw.wmnet
  • 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest2003.codfw.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kafka-jumbo1007.eqiad.wmnet
  • 12:16 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1004.eqiad.wmnet with OS bullseye
  • 12:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host kafka-jumbo1007.eqiad.wmnet
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P53320 and previous config saved to /var/cache/conftool/dbconfig/20231113-120803-arnaudb.json
  • 12:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P53319 and previous config saved to /var/cache/conftool/dbconfig/20231113-115257-arnaudb.json
  • 11:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
  • 11:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
  • 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53318 and previous config saved to /var/cache/conftool/dbconfig/20231113-113751-arnaudb.json
  • 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
  • 11:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53317 and previous config saved to /var/cache/conftool/dbconfig/20231113-113458-arnaudb.json
  • 11:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53316 and previous config saved to /var/cache/conftool/dbconfig/20231113-113437-arnaudb.json
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
  • 11:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host prometheus4002.ulsfo.wmnet
  • 11:20 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P53315 and previous config saved to /var/cache/conftool/dbconfig/20231113-111930-arnaudb.json
  • 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: webperf
  • 11:07 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
  • 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P53314 and previous config saved to /var/cache/conftool/dbconfig/20231113-110424-arnaudb.json
  • 11:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: webperf
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudmetrics[1003-1004].eqiad.wmnet
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudmetrics[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 11:00 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudmetrics[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 10:57 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 jbond: roll restart pybal after failed etcd cr
  • 10:49 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudmetrics[1003-1004].eqiad.wmnet
  • 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53313 and previous config saved to /var/cache/conftool/dbconfig/20231113-104917-arnaudb.json
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53312 and previous config saved to /var/cache/conftool/dbconfig/20231113-104534-arnaudb.json
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53311 and previous config saved to /var/cache/conftool/dbconfig/20231113-104245-arnaudb.json
  • 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 983
  • 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P53309 and previous config saved to /var/cache/conftool/dbconfig/20231113-102739-arnaudb.json
  • 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'depool T350458', diff saved to https://phabricator.wikimedia.org/P53308 and previous config saved to /var/cache/conftool/dbconfig/20231113-102730-arnaudb.json
  • 10:24 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: graphite::production
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P53307 and previous config saved to /var/cache/conftool/dbconfig/20231113-095725-arnaudb.json
  • 09:44 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: graphite::production
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: arclamp
  • 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53306 and previous config saved to /var/cache/conftool/dbconfig/20231113-094218-arnaudb.json
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53305 and previous config saved to /var/cache/conftool/dbconfig/20231113-093824-arnaudb.json
  • 09:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53304 and previous config saved to /var/cache/conftool/dbconfig/20231113-093802-arnaudb.json
  • 09:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: arclamp
  • 09:31 moritzm: installing dbus security updates on bullseye
  • 09:31 jnuche@deploy2002: rebuilt and synchronized wikiversions files: labswiki to 1.42.0-wmf.4 (T350836 T350080)
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P53303 and previous config saved to /var/cache/conftool/dbconfig/20231113-092256-arnaudb.json
  • 09:17 hashar@deploy2002: Finished deploy [integration/docroot@9bf1967]: Replace WikimediaUI Base with Codex design tokens T331403 T334934 (duration: 00m 07s)
  • 09:16 hashar@deploy2002: Started deploy [integration/docroot@9bf1967]: Replace WikimediaUI Base with Codex design tokens T331403 T334934
  • 09:14 jnuche@deploy2002: Finished scap: Backport for Fix BlockDisablesLogin recursion (T350836 T350080) (duration: 07m 49s)
  • 09:08 jnuche@deploy2002: bd808 and jnuche: Continuing with sync
  • 09:08 jnuche@deploy2002: bd808 and jnuche: Backport for Fix BlockDisablesLogin recursion (T350836 T350080) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P53302 and previous config saved to /var/cache/conftool/dbconfig/20231113-090750-arnaudb.json
  • 09:06 jnuche@deploy2002: Started scap: Backport for Fix BlockDisablesLogin recursion (T350836 T350080)
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host webperf2003.codfw.wmnet
  • 08:55 godog: bounce prometheus eqiad for k8s / k8s-aux - T343529
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53301 and previous config saved to /var/cache/conftool/dbconfig/20231113-085243-arnaudb.json
  • 08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53300 and previous config saved to /var/cache/conftool/dbconfig/20231113-084945-arnaudb.json
  • 08:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host webperf2003.codfw.wmnet
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host graphite2004.codfw.wmnet
  • 08:34 hashar@deploy2002: Finished deploy [integration/docroot@bc8aaba]: Add more libraries to doc.wikimedia.org homepage - T327604 (duration: 00m 06s)
  • 08:34 hashar@deploy2002: Started deploy [integration/docroot@bc8aaba]: Add more libraries to doc.wikimedia.org homepage - T327604
  • 08:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host graphite2004.codfw.wmnet
  • 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host arclamp2001.codfw.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host arclamp2001.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: search::loader
  • 07:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: search::loader

2023-11-12

  • 21:28 jiji@cumin2002: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 21:27 jiji@cumin2002: START - Cookbook sre.mediawiki.restart-appservers
  • 21:26 effie: restart php-fpm on jobrunners

2023-11-11

  • 01:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 01:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 01:17 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 01:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 00:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bookworm

2023-11-10

  • 23:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 23:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bookworm
  • 21:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 20:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 20:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 18:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 18:04 bvibber: brion adding more vp9 backfill to the transcode runs on mwmaint2002 (requeueTranscodes -> job queue runners). Should increase load on transcode scaler job runners but not elsewhere
  • 17:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 17:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 17:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 17:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 17:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 17:48 topranks: withdrawing IPv6 prefixes announced to AS1299 in esams to troubleshoot connectivity problem report
  • 17:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 17:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 17:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:59 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:51 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 16:49 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudvirt1062.private.eqiad.wikimedia.cloud on all recursors
  • 16:36 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache cloudvirt1062.private.eqiad.wikimedia.cloud on all recursors
  • 16:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add cloud-private subnet entries for new cloudvirt hosts - cmooney@cumin1001"
  • 16:33 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add cloud-private subnet entries for new cloudvirt hosts - cmooney@cumin1001"
  • 16:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 15:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm
  • 15:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 15:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 15:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm
  • 14:15 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to v23.10.0 - T349492 (duration: 00m 10s)
  • 14:15 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to v23.10.0 - T349492
  • 14:11 denisse: upgradeing LibreNMS to 23.10
  • 13:46 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:44 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:23 moritzm: imported php-geoip 1.1.1-7+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 13:05 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 12:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:34 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:25 moritzm: imported php-pcov 1.0.6-4+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 12:12 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:04 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:58 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:50 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:50 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:46 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:18 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 11:17 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:16 moritzm: imported tideways 5.0.4-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:05 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:41 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:41 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:38 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:38 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 10:32 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 10:25 moritzm: imported dh-php 0.35+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:16 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:10 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:09 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:05 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:02 moritzm: imported php-excimer 1.0.2-1+wmf3+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:02 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:01 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 09:57 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:57 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 09:29 moritzm: imported wikidiff2 1.14.1-0+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:12 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 09:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:07 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 08:35 moritzm: imported php-defaults 2:7.4+76+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 08:35 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:34 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:01 moritzm: imported php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf11u1 to component/php74 for bullseye-wikimedia
  • 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::search_platform
  • 07:01 vgutierrez: cleaning up digicert-2022 update-ocsp config bits from cp servers
  • 06:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::search_platform
  • 03:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53289 and previous config saved to /var/cache/conftool/dbconfig/20231110-032053-arnaudb.json
  • 03:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P53288 and previous config saved to /var/cache/conftool/dbconfig/20231110-030547-arnaudb.json
  • 02:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P53287 and previous config saved to /var/cache/conftool/dbconfig/20231110-025041-arnaudb.json
  • 02:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53286 and previous config saved to /var/cache/conftool/dbconfig/20231110-023534-arnaudb.json
  • 02:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53285 and previous config saved to /var/cache/conftool/dbconfig/20231110-022351-arnaudb.json
  • 02:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 02:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 02:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53284 and previous config saved to /var/cache/conftool/dbconfig/20231110-022330-arnaudb.json
  • 02:15 tzatziki: removing 3 files for legal compliance
  • 02:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P53283 and previous config saved to /var/cache/conftool/dbconfig/20231110-020823-arnaudb.json
  • 01:58 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P53282 and previous config saved to /var/cache/conftool/dbconfig/20231110-015317-arnaudb.json
  • 01:42 tzatziki: removing 16 files for legal compliance
  • 01:39 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53281 and previous config saved to /var/cache/conftool/dbconfig/20231110-013810-arnaudb.json
  • 01:36 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
  • 01:21 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:20 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53280 and previous config saved to /var/cache/conftool/dbconfig/20231110-011712-arnaudb.json
  • 01:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 01:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53279 and previous config saved to /var/cache/conftool/dbconfig/20231110-011701-arnaudb.json
  • 01:15 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:13 bd808: SAL test (T343157)
  • 01:10 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:10 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 01:02 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P53278 and previous config saved to /var/cache/conftool/dbconfig/20231110-010154-arnaudb.json
  • 01:00 wfan: update fraud filter, config revision changed from 4cfbb04b to 39a846b3
  • 00:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 00:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P53277 and previous config saved to /var/cache/conftool/dbconfig/20231110-004647-arnaudb.json
  • 00:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:44 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 00:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:31 tzatziki: removing 1 file for legal compliance
  • 00:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53276 and previous config saved to /var/cache/conftool/dbconfig/20231110-003141-arnaudb.json
  • 00:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53275 and previous config saved to /var/cache/conftool/dbconfig/20231110-002747-arnaudb.json
  • 00:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 00:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 00:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53274 and previous config saved to /var/cache/conftool/dbconfig/20231110-002725-arnaudb.json
  • 00:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P53273 and previous config saved to /var/cache/conftool/dbconfig/20231110-001219-arnaudb.json
  • 00:09 ejegg: fundraising python tools upgraded from a4cbbbe7 to 117e1f9c
  • 00:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53272 and previous config saved to /var/cache/conftool/dbconfig/20231110-000322-root.json

2023-11-09

  • 23:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P53271 and previous config saved to /var/cache/conftool/dbconfig/20231109-235712-arnaudb.json
  • 23:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53270 and previous config saved to /var/cache/conftool/dbconfig/20231109-234817-root.json
  • 23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53269 and previous config saved to /var/cache/conftool/dbconfig/20231109-234206-arnaudb.json
  • 23:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53268 and previous config saved to /var/cache/conftool/dbconfig/20231109-233816-arnaudb.json
  • 23:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53267 and previous config saved to /var/cache/conftool/dbconfig/20231109-233728-arnaudb.json
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53266 and previous config saved to /var/cache/conftool/dbconfig/20231109-233312-root.json
  • 23:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P53265 and previous config saved to /var/cache/conftool/dbconfig/20231109-232221-arnaudb.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53264 and previous config saved to /var/cache/conftool/dbconfig/20231109-231807-root.json
  • 23:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P53263 and previous config saved to /var/cache/conftool/dbconfig/20231109-230715-arnaudb.json
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53262 and previous config saved to /var/cache/conftool/dbconfig/20231109-230302-root.json
  • 22:59 ejegg: payments-wiki upgraded from 6f27bf65 to 2018a390
  • 22:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53261 and previous config saved to /var/cache/conftool/dbconfig/20231109-225208-arnaudb.json
  • 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53260 and previous config saved to /var/cache/conftool/dbconfig/20231109-224818-arnaudb.json
  • 22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 22:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53259 and previous config saved to /var/cache/conftool/dbconfig/20231109-224757-arnaudb.json
  • 22:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P53258 and previous config saved to /var/cache/conftool/dbconfig/20231109-223250-arnaudb.json
  • 22:28 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P53257 and previous config saved to /var/cache/conftool/dbconfig/20231109-221744-arnaudb.json
  • 22:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:08 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53256 and previous config saved to /var/cache/conftool/dbconfig/20231109-220238-arnaudb.json
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53255 and previous config saved to /var/cache/conftool/dbconfig/20231109-215741-arnaudb.json
  • 21:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53254 and previous config saved to /var/cache/conftool/dbconfig/20231109-215719-arnaudb.json
  • 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P53253 and previous config saved to /var/cache/conftool/dbconfig/20231109-214213-arnaudb.json
  • 21:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2146.codfw.wmnet with OS bookworm
  • 21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P53252 and previous config saved to /var/cache/conftool/dbconfig/20231109-212707-arnaudb.json
  • 21:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: host reimage
  • 21:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: host reimage
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53251 and previous config saved to /var/cache/conftool/dbconfig/20231109-211200-arnaudb.json
  • 21:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53250 and previous config saved to /var/cache/conftool/dbconfig/20231109-210806-arnaudb.json
  • 21:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53249 and previous config saved to /var/cache/conftool/dbconfig/20231109-210744-arnaudb.json
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old crX-codfw sandbox int IPs - cmooney@cumin1001"
  • 21:03 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old crX-codfw sandbox int IPs - cmooney@cumin1001"
  • 21:02 brennen: no pathces for utc late backport & config
  • 21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2146.codfw.wmnet with OS bookworm
  • 20:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2146 T350916', diff saved to https://phabricator.wikimedia.org/P53248 and previous config saved to /var/cache/conftool/dbconfig/20231109-205445-root.json
  • 20:54 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1005.eqiad.wmnet with OS bullseye
  • 20:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:54 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 20:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P53247 and previous config saved to /var/cache/conftool/dbconfig/20231109-205238-arnaudb.json
  • 20:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.3.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:45 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.3.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:41 topranks: change anycast gw type to single-IP on ssw1-aX-codfw for sandbox1-a-codfw vlan (T350579)
  • 20:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P53246 and previous config saved to /var/cache/conftool/dbconfig/20231109-203731-arnaudb.json
  • 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 20:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 20:32 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1005.eqiad.wmnet with reason: host reimage
  • 20:29 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for ssw1-aX-codfw xlink IPs. - cmooney@cumin1001"
  • 20:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for ssw1-aX-codfw xlink IPs. - cmooney@cumin1001"
  • 20:28 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1005.eqiad.wmnet with reason: host reimage
  • 20:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53245 and previous config saved to /var/cache/conftool/dbconfig/20231109-202225-arnaudb.json
  • 20:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53244 and previous config saved to /var/cache/conftool/dbconfig/20231109-201830-arnaudb.json
  • 20:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 20:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 20:15 topranks: resetting asw-a-codfw et-2/0/52 to shift traffic away from ssw1-a8-codfw (T347191)
  • 20:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53243 and previous config saved to /var/cache/conftool/dbconfig/20231109-201409-arnaudb.json
  • 20:13 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 20:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,ssw1-a8-codfw,ssw1-a8-codfw.mgmt with reason: Adjust vlans trunked to asw-a-codfw from ssw1-a8-codfw T347191
  • 20:12 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,ssw1-a8-codfw,ssw1-a8-codfw.mgmt with reason: Adjust vlans trunked to asw-a-codfw from ssw1-a8-codfw T347191
  • 20:10 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 20:08 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 20:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:06 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 19:59 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P53242 and previous config saved to /var/cache/conftool/dbconfig/20231109-195903-arnaudb.json
  • 19:55 volans@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 19:52 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 19:51 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2012.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 19:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 19:50 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bookworm
  • 19:50 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:48 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 19:47 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 19:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2012.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 19:45 urbanecm@deploy2002: Finished scap: Backport for wikimaniawiki: Revert wordmark and tagline back (T350640) (duration: 07m 22s)
  • 19:44 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 19:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P53241 and previous config saved to /var/cache/conftool/dbconfig/20231109-194357-arnaudb.json
  • 19:43 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bookworm
  • 19:41 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:38 volans@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:38 urbanecm@deploy2002: Started scap: Backport for wikimaniawiki: Revert wordmark and tagline back (T350640)
  • 19:34 urbanecm@deploy2002: Finished scap: Backport for wikimaniawiki: Switch back to standard logo (T350640) (duration: 07m 11s)
  • 19:33 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:33 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1108.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 19:32 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:30 volans@cumin1001: START - Cookbook sre.hosts.provision for host cp1108.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53240 and previous config saved to /var/cache/conftool/dbconfig/20231109-192850-arnaudb.json
  • 19:28 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:28 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 19:28 urbanecm@deploy2002: urbanecm: Backport for wikimaniawiki: Switch back to standard logo (T350640) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:27 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 19:26 urbanecm@deploy2002: Started scap: Backport for wikimaniawiki: Switch back to standard logo (T350640)
  • 19:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53239 and previous config saved to /var/cache/conftool/dbconfig/20231109-192621-arnaudb.json
  • 19:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 19:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 19:25 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:25 topranks: shutting down et-1/1/5.2201 (sandbox1-a-codfw) interfaces on crX-codfw (T348159)
  • 19:25 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53238 and previous config saved to /var/cache/conftool/dbconfig/20231109-192416-arnaudb.json
  • 19:22 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 19:22 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 19:20 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 19:20 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:20 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:19 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 19:17 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 19:16 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 19:15 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 19:12 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 19:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P53237 and previous config saved to /var/cache/conftool/dbconfig/20231109-190910-arnaudb.json
  • 19:07 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:06 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 19:06 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 19:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:06 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for sandbox1-codfw IPs - cmooney@cumin1001"
  • 19:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:05 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for sandbox1-codfw IPs - cmooney@cumin1001"
  • 19:05 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 19:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 19:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:05 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 19:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief-test2001.codfw.wmnet with OS bookworm
  • 19:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bookworm
  • 19:03 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:02 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bookworm
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P53236 and previous config saved to /var/cache/conftool/dbconfig/20231109-185403-arnaudb.json
  • 18:52 topranks: renumber VRRP GW VIP on crX-codfw for sandbox1-a-codfw (T348159)
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 18:49 topranks: Adding anycast gw config to ssw*codfw for vlan sandbox1-a-codfw (T348159)
  • 18:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 18:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 18:45 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53235 and previous config saved to /var/cache/conftool/dbconfig/20231109-183857-arnaudb.json
  • 18:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53234 and previous config saved to /var/cache/conftool/dbconfig/20231109-183626-arnaudb.json
  • 18:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:36 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53233 and previous config saved to /var/cache/conftool/dbconfig/20231109-183603-arnaudb.json
  • 18:35 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:33 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief-test2001.codfw.wmnet with OS bookworm
  • 18:30 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:30 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:30 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:30 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:23 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching aqs20[09-12].codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 18:23 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:23 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P53232 and previous config saved to /var/cache/conftool/dbconfig/20231109-182057-arnaudb.json
  • 18:18 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[09-12].codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 18:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P53230 and previous config saved to /var/cache/conftool/dbconfig/20231109-180551-arnaudb.json
  • 18:03 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:51 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wdqs::test
  • 17:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53229 and previous config saved to /var/cache/conftool/dbconfig/20231109-175044-arnaudb.json
  • 17:48 fabfur: depooled service ats-be for cp1101 (T349244)
  • 17:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53228 and previous config saved to /var/cache/conftool/dbconfig/20231109-174801-arnaudb.json
  • 17:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53227 and previous config saved to /var/cache/conftool/dbconfig/20231109-174740-arnaudb.json
  • 17:45 fabfur: pooled cp1101 into upload cluster (both cdn and ats-be): T349244
  • 17:41 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wdqs::test
  • 17:38 fabfur: removed cp1076 from HAProxy/Varnish pool (NOT ats-be pool) for T349244
  • 17:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1101.eqiad.wmnet
  • 17:37 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1101.eqiad.wmnet
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P53226 and previous config saved to /var/cache/conftool/dbconfig/20231109-173233-arnaudb.json
  • 17:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: elasticsearch::relforge
  • 17:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P53225 and previous config saved to /var/cache/conftool/dbconfig/20231109-171727-arnaudb.json
  • 17:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: elasticsearch::relforge
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 17:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53224 and previous config saved to /var/cache/conftool/dbconfig/20231109-170220-arnaudb.json
  • 16:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53223 and previous config saved to /var/cache/conftool/dbconfig/20231109-165947-arnaudb.json
  • 16:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53222 and previous config saved to /var/cache/conftool/dbconfig/20231109-165925-arnaudb.json
  • 16:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 16:55 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:aqs-codfw: Applying JVM security upgrade - eevans@cumin1001
  • 16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 16:48 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks write both on enwiki (T345732) (duration: 08m 09s)
  • 16:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P53221 and previous config saved to /var/cache/conftool/dbconfig/20231109-164419-arnaudb.json
  • 16:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:42 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:41 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:41 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks write both on enwiki (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:41 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:40 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks write both on enwiki (T345732)
  • 16:31 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:31 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P53220 and previous config saved to /var/cache/conftool/dbconfig/20231109-162913-arnaudb.json
  • 16:26 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:26 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:23 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Applying JVM security upgrade - eevans@cumin1001
  • 16:20 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Applying JVM security upgrade - eevans@cumin1001
  • 16:14 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53219 and previous config saved to /var/cache/conftool/dbconfig/20231109-161406-arnaudb.json
  • 16:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53218 and previous config saved to /var/cache/conftool/dbconfig/20231109-161134-arnaudb.json
  • 16:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53217 and previous config saved to /var/cache/conftool/dbconfig/20231109-161112-arnaudb.json
  • 16:08 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:07 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:07 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:58 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bookworm
  • 15:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P53216 and previous config saved to /var/cache/conftool/dbconfig/20231109-155606-arnaudb.json
  • 15:47 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:46 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 15:46 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 15:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P53215 and previous config saved to /var/cache/conftool/dbconfig/20231109-154100-arnaudb.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53214 and previous config saved to /var/cache/conftool/dbconfig/20231109-153321-root.json
  • 15:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 15:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Applying JVM security upgrade - eevans@cumin1001
  • 15:29 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53213 and previous config saved to /var/cache/conftool/dbconfig/20231109-152856-root.json
  • 15:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53212 and previous config saved to /var/cache/conftool/dbconfig/20231109-152553-arnaudb.json
  • 15:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1100.eqiad.wmnet
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1100.eqiad.wmnet
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53211 and previous config saved to /var/cache/conftool/dbconfig/20231109-152438-root.json
  • 15:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53210 and previous config saved to /var/cache/conftool/dbconfig/20231109-152320-arnaudb.json
  • 15:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:23 fabfur: cp1100 inserted into cluster_text pool
  • 15:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53209 and previous config saved to /var/cache/conftool/dbconfig/20231109-152259-arnaudb.json
  • 15:21 fabfur: removed cp1075 from HAProxy/Varnish pool (NOT ats-be pool) for T349244
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1017.eqiad.wmnet
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:19 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53208 and previous config saved to /var/cache/conftool/dbconfig/20231109-151816-root.json
  • 15:17 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53207 and previous config saved to /var/cache/conftool/dbconfig/20231109-151351-root.json
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1017.eqiad.wmnet
  • 15:12 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bookworm
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53206 and previous config saved to /var/cache/conftool/dbconfig/20231109-150933-root.json
  • 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P53205 and previous config saved to /var/cache/conftool/dbconfig/20231109-150752-arnaudb.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53204 and previous config saved to /var/cache/conftool/dbconfig/20231109-150311-root.json
  • 15:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 15:00 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:59 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 14:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53203 and previous config saved to /var/cache/conftool/dbconfig/20231109-145846-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53202 and previous config saved to /var/cache/conftool/dbconfig/20231109-145428-root.json
  • 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P53201 and previous config saved to /var/cache/conftool/dbconfig/20231109-145246-arnaudb.json
  • 14:52 kostajh: UTC afternoon deploys done
  • 14:50 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Define virtual domains mapping config (T350321) (duration: 07m 07s)
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53200 and previous config saved to /var/cache/conftool/dbconfig/20231109-144806-root.json
  • 14:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ceph::server
  • 14:46 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1005.eqiad.wmnet with OS bullseye
  • 14:44 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 14:44 kharlan@deploy2002: kharlan and dreamyjazz: Backport for MediaModeration: Define virtual domains mapping config (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53199 and previous config saved to /var/cache/conftool/dbconfig/20231109-144342-root.json
  • 14:43 kharlan@deploy2002: Started scap: Backport for MediaModeration: Define virtual domains mapping config (T350321)
  • 14:41 kharlan@deploy2002: Finished scap: Backport for Revert "CheckUser: Set 'debug' log level" (T345591) (duration: 07m 43s)
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53198 and previous config saved to /var/cache/conftool/dbconfig/20231109-143924-root.json
  • 14:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53197 and previous config saved to /var/cache/conftool/dbconfig/20231109-143739-arnaudb.json
  • 14:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: ceph::server
  • 14:36 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 14:35 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Revert "CheckUser: Set 'debug' log level" (T345591) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53196 and previous config saved to /var/cache/conftool/dbconfig/20231109-143508-arnaudb.json
  • 14:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:34 kharlan@deploy2002: Started scap: Backport for Revert "CheckUser: Set 'debug' log level" (T345591)
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53195 and previous config saved to /var/cache/conftool/dbconfig/20231109-143301-root.json
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::datahub::opensearch
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53194 and previous config saved to /var/cache/conftool/dbconfig/20231109-143254-arnaudb.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2171:3315 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53193 and previous config saved to /var/cache/conftool/dbconfig/20231109-143051-root.json
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53191 and previous config saved to /var/cache/conftool/dbconfig/20231109-142837-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1224 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53190 and previous config saved to /var/cache/conftool/dbconfig/20231109-142621-root.json
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53189 and previous config saved to /var/cache/conftool/dbconfig/20231109-142419-root.json
  • 14:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::datahub::opensearch
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53188 and previous config saved to /var/cache/conftool/dbconfig/20231109-142139-root.json
  • 14:21 moritzm: restarting turnilo on an-tool1007
  • 14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: schema update via T343198
  • 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: schema update via T343198
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::turnilo
  • 14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53187 and previous config saved to /var/cache/conftool/dbconfig/20231109-141749-arnaudb.json
  • 14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::turnilo
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: karapace
  • 14:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53185 and previous config saved to /var/cache/conftool/dbconfig/20231109-140245-arnaudb.json
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: karapace
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host stat1009.eqiad.wmnet
  • 13:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53184 and previous config saved to /var/cache/conftool/dbconfig/20231109-134740-arnaudb.json
  • 13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host stat1009.eqiad.wmnet
  • 13:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53183 and previous config saved to /var/cache/conftool/dbconfig/20231109-133235-arnaudb.json
  • 13:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53180 and previous config saved to /var/cache/conftool/dbconfig/20231109-131730-arnaudb.json
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53179 and previous config saved to /var/cache/conftool/dbconfig/20231109-130225-arnaudb.json
  • 12:44 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T348183)', diff saved to https://phabricator.wikimedia.org/P53178 and previous config saved to /var/cache/conftool/dbconfig/20231109-124404-arnaudb.json
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:43 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:42 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:38 moritzm: installing qemu security updates
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::misccrons
  • 12:23 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 12:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 12:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::misccrons
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::xmldumps
  • 12:09 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::xmldumps
  • 12:08 moritzm: installing python-reportlab security updates
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::xmlfallback
  • 11:43 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::xmlfallback
  • 11:38 _joe_: disabled requestctl cache-text/wikifeeds_featured T350645 T346657
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: url_downloader
  • 11:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: url_downloader
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host urldownloader2003.wikimedia.org
  • 11:09 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 11:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 11:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 11:07 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 11:06 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 11:06 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 11:05 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 11:04 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:04 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:03 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:32 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:31 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:55 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 09:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host urldownloader2003.wikimedia.org
  • 09:51 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 09:51 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 09:41 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
  • 09:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: Deploy 1.42.0-wmf.4 to group2 (labswiki staying at 1.42.0-wmf.3 due to T350836)
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::infrastructure_foundations
  • 09:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::infrastructure_foundations
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::data_engineering
  • 09:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::data_engineering
  • 08:47 godog: add 50G to prometheus/ml-serve in codfw
  • 08:35 Emperor: restart vopsbot.service on alert1001
  • 08:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::dse_k8s_etcd
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2001.codfw.wmnet
  • 08:19 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:19 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2001.codfw.wmnet
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::kubernetes::staging
  • 08:14 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:07 oblivian@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 07:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::kubernetes::staging
  • 07:35 moritzm: installing openjdk-8 security updates
  • 07:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 07:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 07:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db2112 T350142', diff saved to https://phabricator.wikimedia.org/P53177 and previous config saved to /var/cache/conftool/dbconfig/20231109-070936-arnaudb.json
  • 07:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary and set section read-write T350142', diff saved to https://phabricator.wikimedia.org/P53176 and previous config saved to /var/cache/conftool/dbconfig/20231109-070410-arnaudb.json
  • 07:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T350142', diff saved to https://phabricator.wikimedia.org/P53175 and previous config saved to /var/cache/conftool/dbconfig/20231109-070012-arnaudb.json
  • 07:00 arnaudb: Starting s1 codfw failover from db2112 to db2103 - T350142
  • 06:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T350142', diff saved to https://phabricator.wikimedia.org/P53174 and previous config saved to /var/cache/conftool/dbconfig/20231109-062725-arnaudb.json
  • 06:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 06:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142

2023-11-08

  • 23:31 wfan: civicrm upgraded from 81bd4c7d to 88361167
  • 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
  • 22:24 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 22:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 22:24 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 22:24 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 22:14 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 22:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
  • 21:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 21:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[22-24,29,32].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bookworm
  • 20:26 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:25 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:23 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:21 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:21 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:10 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 20:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:57 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
  • 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief-test1001.eqiad.wmnet with OS bookworm
  • 16:54 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:54 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:48 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 16:47 jforrester@deploy2002: Finished scap: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777) (duration: 07m 29s)
  • 16:41 jforrester@deploy2002: jforrester: Continuing with sync
  • 16:40 jforrester@deploy2002: jforrester: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:39 jforrester@deploy2002: Started scap: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777)
  • 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 16:34 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries (duration: 00m 27s)
  • 16:33 ebernhardson@deploy2002: Started deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries
  • 16:31 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 16:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::master
  • 16:23 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bookworm
  • 16:11 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dse_k8s::master
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::worker
  • 16:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
  • 16:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
  • 15:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dse_k8s::worker
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
  • 15:48 bvibber: brion running requeueTranscodes.php on mwmaint2002 to continue backfill for iOS-compatible low-res video (throttled)
  • 15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
  • 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
  • 15:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 15:41 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 15:33 bvibber: brion running requeueTranscodes.php to batch-remove old low-res VP9 WebM transcodes (should be low impact)
  • 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
  • 15:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:26 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:26 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::master
  • 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bookworm
  • 15:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::master
  • 15:04 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" (duration: 06m 51s)
  • 14:59 marostegui@deploy2002: marostegui: Continuing with sync
  • 14:59 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::worker
  • 14:58 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master"
  • 14:53 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:53 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::worker
  • 14:51 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (duration: 08m 41s)
  • 14:46 marostegui@deploy2002: marostegui: Continuing with sync
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::zookeeper
  • 14:44 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc2 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:42 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master
  • 14:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
  • 14:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
  • 14:40 taavi@deploy2002: Finished scap: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509) (duration: 07m 45s)
  • 14:35 _joe_: Running puppet on cp-text to pick up the increase in traffic to mw on k8s
  • 14:35 taavi@deploy2002: taavi and superpes: Continuing with sync
  • 14:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::zookeeper
  • 14:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:34 taavi@deploy2002: taavi and superpes: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 taavi@deploy2002: Started scap: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509)
  • 14:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:31 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 14:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:27 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 14:26 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:26 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:23 taavi@deploy2002: Finished scap: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067) (duration: 12m 19s)
  • 14:17 taavi@deploy2002: taavi and daimona: Continuing with sync
  • 14:12 taavi@deploy2002: taavi and daimona: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 taavi@deploy2002: Started scap: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067)
  • 14:10 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bookworm
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::test::broker
  • 14:04 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:04 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:04 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:03 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::test::broker
  • 13:55 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::hadoop::worker
  • 13:34 moritzm: installing libxpm security updates
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: openldap::replica
  • 13:14 taavi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:14 taavi@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
  • 13:12 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
  • 13:10 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4 refs T350080
  • 13:10 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:04 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: openldap::replica
  • 11:49 ladsgroup@deploy2002: Finished scap: Backport for Only take one field in fetchFieldValues (T350726) (duration: 07m 00s)
  • 11:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:43 ladsgroup@deploy2002: ladsgroup: Backport for Only take one field in fetchFieldValues (T350726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:42 ladsgroup@deploy2002: Started scap: Backport for Only take one field in fetchFieldValues (T350726)
  • 11:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:37 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 11:33 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 11:32 effie: stopping puppet from mc2038
  • 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: analytics_cluster::hadoop::worker
  • 11:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 11:12 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:10 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:10 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:09 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:09 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:08 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-esams and A:cp
  • 11:07 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:07 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:07 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:06 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:04 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:04 btullis@cumin1001: Added views for new wiki: fonwiki T347938
  • 10:43 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::hadoop::worker
  • 10:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::web::htmldumps
  • 10:30 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:25 brouberol@deploy2002: Finished deploy [airflow-dags/analytics@af7f4e5]: (no justification provided) (duration: 00m 31s)
  • 10:24 brouberol@deploy2002: Started deploy [airflow-dags/analytics@af7f4e5]: (no justification provided)
  • 10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::web::htmldumps
  • 10:24 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-esams and A:cp
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-worker1111.eqiad.wmnet
  • 10:06 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-worker1111.eqiad.wmnet
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53170 and previous config saved to /var/cache/conftool/dbconfig/20231108-095701-arnaudb.json
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53169 and previous config saved to /var/cache/conftool/dbconfig/20231108-094156-arnaudb.json
  • 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53167 and previous config saved to /var/cache/conftool/dbconfig/20231108-091146-arnaudb.json
  • 09:02 oblivian@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53166 and previous config saved to /var/cache/conftool/dbconfig/20231108-085641-arnaudb.json
  • 08:55 moritzm: restarting archiva to pick up Java security updates
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45899
  • 08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45899
  • 08:51 moritzm: installing openjdk-8 security updates
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53165 and previous config saved to /var/cache/conftool/dbconfig/20231108-084136-arnaudb.json
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::test_analytics::worker
  • 08:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53164 and previous config saved to /var/cache/conftool/dbconfig/20231108-082631-arnaudb.json
  • 08:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::test_analytics::worker
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::turnilo::staging
  • 07:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::turnilo::staging
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: zookeeper::test
  • 07:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: zookeeper::test
  • 00:16 urbanecm: mwmaint2002: Stop T315510#9312431 instances of extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php (T315510)

2023-11-07

  • 23:09 ladsgroup@deploy2002: Finished scap: Backport for styles: Fix stylesheet validation issues (duration: 07m 14s)
  • 23:04 ladsgroup@deploy2002: ladsgroup and volker-e: Continuing with sync
  • 23:03 ladsgroup@deploy2002: ladsgroup and volker-e: Backport for styles: Fix stylesheet validation issues synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:02 ladsgroup@deploy2002: Started scap: Backport for styles: Fix stylesheet validation issues
  • 22:19 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:18 ladsgroup@deploy2002: Finished scap: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934) (duration: 09m 15s)
  • 22:13 ladsgroup@deploy2002: ladsgroup and volker-e: Continuing with sync
  • 22:10 ladsgroup@deploy2002: ladsgroup and volker-e: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:09 ladsgroup@deploy2002: Started scap: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934)
  • 22:00 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T350703
  • 21:58 tgr@deploy2002: Finished scap: Backport for Fix centralauthtoken key schema migration (T347223 T350723) (duration: 13m 17s)
  • 21:53 tgr@deploy2002: tgr: Continuing with sync
  • 21:46 tgr@deploy2002: tgr: Backport for Fix centralauthtoken key schema migration (T347223 T350723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:45 tgr@deploy2002: Started scap: Backport for Fix centralauthtoken key schema migration (T347223 T350723)
  • 21:37 tgr@deploy2002: Finished scap: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695) (duration: 20m 27s)
  • 21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T350703
  • 21:35 tzatziki: changing email for User:Rlayton-WMF
  • 21:32 tgr@deploy2002: tgr: Continuing with sync
  • 21:18 tgr@deploy2002: tgr: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:17 tgr@deploy2002: Started scap: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695)
  • 21:14 tgr@deploy2002: Finished scap: Backport for Enable edit check on fonwiki (T350634) (duration: 09m 45s)
  • 21:09 tgr@deploy2002: tgr and kemayo: Continuing with sync
  • 21:06 tgr@deploy2002: tgr and kemayo: Backport for Enable edit check on fonwiki (T350634) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:05 tgr@deploy2002: Started scap: Backport for Enable edit check on fonwiki (T350634)
  • 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host stewards1001.eqiad.wmnet
  • 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards1001.eqiad.wmnet with OS bookworm
  • 20:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:18 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) stewards1001.eqiad.wmnet on all recursors
  • 20:18 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache stewards1001.eqiad.wmnet on all recursors
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:16 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:58 wfan: payments-wiki change from 1d66a20f to 8c073c23, config revision changed from c841729a to 5c00d761
  • 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53158 and previous config saved to /var/cache/conftool/dbconfig/20231107-195420-root.json
  • 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53157 and previous config saved to /var/cache/conftool/dbconfig/20231107-193915-root.json
  • 19:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:26 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mail::mx
  • 19:25 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:25 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53156 and previous config saved to /var/cache/conftool/dbconfig/20231107-192410-root.json
  • 19:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:22 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host stewards1001.eqiad.wmnet
  • 19:18 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:17 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mail::mx
  • 19:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards2001.codfw.wmnet with OS bookworm
  • 19:16 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kerberos::kdc
  • 19:12 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53155 and previous config saved to /var/cache/conftool/dbconfig/20231107-190905-root.json
  • 19:06 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: kerberos::kdc
  • 19:04 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: apt_staging
  • 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards2001.codfw.wmnet with reason: host reimage
  • 19:01 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master" (duration: 06m 40s)
  • 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards2001.codfw.wmnet with reason: host reimage
  • 18:57 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: apt_staging
  • 18:56 marostegui@deploy2002: marostegui: Continuing with sync
  • 18:55 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:54 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master"
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53154 and previous config saved to /var/cache/conftool/dbconfig/20231107-185400-root.json
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1192 T346454', diff saved to https://phabricator.wikimedia.org/P53153 and previous config saved to /var/cache/conftool/dbconfig/20231107-185033-root.json
  • 18:44 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc2 master (duration: 06m 47s)
  • 18:39 marostegui@deploy2002: marostegui: Continuing with sync
  • 18:38 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc2 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:37 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc2 master
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Upgrade
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Upgrade
  • 18:33 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards2001.codfw.wmnet with OS bookworm
  • 18:30 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 herron: performing rolling memory increase on logstash collector VMs T350434
  • 18:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:26 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:26 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netmon
  • 18:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netmon
  • 18:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mirrors
  • 18:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:08 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:06 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mirrors
  • 17:21 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.eqiad.wmnet with OS bookworm
  • 17:20 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1005.eqiad.wmnet
  • 17:13 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:13 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:13 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:13 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:12 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:12 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:11 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:11 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:10 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:10 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:08 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.eqiad.wmnet
  • 17:08 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:07 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:03 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:01 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:00 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:58 urbanecm@deploy2002: Finished scap: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157) (duration: 07m 08s)
  • 16:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:52 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 16:52 urbanecm@deploy2002: urbanecm: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:51 urbanecm@deploy2002: Started scap: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157)
  • 16:47 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:42 ottomata: increasing eventgate cpu limits 1000m -> 1500m hopefully to reduce throttling, also setting stream_config_retries: 3 to avoid stream config refetch failures for eventgate-analytics-external.
  • 16:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:12 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::ui::superset
  • 15:58 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:57 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:55 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 15:49 moritzm: importing openjdk-8 8u392-ga-1~deb10u1 for buster-wikimedia to apt.wikimedia.org (latest Java 8 security fixes)
  • 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:48 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:42 bvibber: brion halting requeueTranscodes.php media backfill job insertions for a bit while the queue catches up
  • 15:39 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:39 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:38 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:38 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:34 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:30 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:30 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:29 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:29 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:26 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:25 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:25 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 urbanecm@deploy2002: Finished scap: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn (duration: 07m 37s)
  • 15:21 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:16 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 15:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:15 urbanecm@deploy2002: urbanecm: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:13 urbanecm@deploy2002: Started scap: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn
  • 15:13 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bookworm
  • 15:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::ui::superset
  • 14:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:10 jforrester@deploy2002: Finished scap: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082) (duration: 07m 00s)
  • 14:09 urbanecm: mwmaint2002: Start multiple instances of extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php (T315510#9312431)
  • 14:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:04 jforrester@deploy2002: jforrester: Continuing with sync
  • 14:04 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 jforrester@deploy2002: Started scap: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082)
  • 13:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:49 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:48 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:48 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:24 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::database
  • 13:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 13:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 13:18 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:09 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::database
  • 13:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::frontend
  • 12:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::frontend
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::ui::superset::staging
  • 12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::ui::superset::staging
  • 12:14 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 12:11 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 12:09 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 12:05 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:53 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:48 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:48 btullis@cumin1001: Added views for new wiki: dgawiki T350228
  • 11:48 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::client
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:47 btullis@cumin1001: Added views for new wiki: bjnwikiquote T350234
  • 11:47 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:47 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 11:45 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:45 btullis@cumin1001: Added views for new wiki: bbcwiki T350372
  • 11:45 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:44 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 11:43 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 11:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[1075-1090].eqiad.wmnet} and A:cp
  • 11:37 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::client
  • 11:34 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:33 btullis@cumin1001: Added views for new wiki: zghwiki T350240
  • 11:32 topranks: reset PIC in cr1-eqiad slot 1/1 to enable port et-1/1/2 at 100G for new transport (T350504)
  • 11:27 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::presto::server
  • 11:26 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:21 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:20 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::presto::server
  • 11:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::worker
  • 11:13 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::worker
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::standby
  • 11:03 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[1075-1090].eqiad.wmnet} and A:cp
  • 11:02 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:59 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::standby
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::master
  • 10:36 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 10:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::master
  • 10:33 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::coordinator
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 10:12 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:11 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:11 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::coordinator
  • 10:11 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 10:10 moritzm: installing dbus security updates on bookworm
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 10:04 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:04 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 10:03 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 10:03 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:02 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 09:59 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 09:53 moritzm: installing nss security updates
  • 09:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove drmrs-esams IPs - ayounsi@cumin1001"
  • 09:34 dcausse: restarting blazegraph on wdqs1007 (stuck for 10+hours)
  • 09:34 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove drmrs-esams IPs - ayounsi@cumin1001"
  • 09:32 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:27 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 09:23 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:23 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:22 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4 refs T350080
  • 09:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-codfw and A:cp
  • 09:12 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:00 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:00 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:44 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-codfw and A:cp
  • 06:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 06:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 05:48 kart_: Updated cxserver to 2023-11-06-060744-production (T333969, T350229, T350241, T350373)
  • 05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:55 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.2 (duration: 02m 12s)
  • 04:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.4 refs T350080 (duration: 51m 04s)
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.4 refs T350080
  • 00:32 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=96) for new host stewards2001.codfw.wmnet
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:31 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) stewards2001.codfw.wmnet on all recursors
  • 00:31 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache stewards2001.codfw.wmnet on all recursors
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:30 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 00:27 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host stewards2001.codfw.wmnet

2023-11-06

  • 23:05 ejegg: fundraising civicrm upgraded from 5be02f1b to f1d49e66
  • 23:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 23:02 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 23:01 cjming: end of UTC late backport window
  • 22:58 cjming@deploy2002: Finished scap: Backport for [Languages] Add namespace translations for zgh (duration: 11m 28s)
  • 22:52 cjming@deploy2002: cjming and jhsoby: Continuing with sync
  • 22:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:52 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:48 cjming@deploy2002: cjming and jhsoby: Backport for [Languages] Add namespace translations for zgh synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:46 cjming@deploy2002: Started scap: Backport for [Languages] Add namespace translations for zgh
  • 22:41 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:41 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:34 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:34 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:23 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:05 cjming@deploy2002: Finished scap: Backport for mznwiki: add project namespace (T350397) (duration: 09m 09s)
  • 22:00 cjming@deploy2002: cjming and anzx: Continuing with sync
  • 21:57 cjming@deploy2002: cjming and anzx: Backport for mznwiki: add project namespace (T350397) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:56 cjming@deploy2002: Started scap: Backport for mznwiki: add project namespace (T350397)
  • 21:48 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:47 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:47 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 21:46 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 21:46 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 21:45 cjming@deploy2002: Finished scap: Backport for Avoid nullish coalescing operators (T350519) (duration: 15m 09s)
  • 21:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 21:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 21:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:40 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 21:32 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:31 cjming@deploy2002: jdlrobson and cjming: Backport for Avoid nullish coalescing operators (T350519) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 cjming@deploy2002: Started scap: Backport for Avoid nullish coalescing operators (T350519)
  • 21:30 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:29 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:28 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 ottomata: eventgate-analytics-external - deploy change to remove 'dynamic' stream config support, instead just re-cache stream configs every 60s - https://phabricator.wikimedia.org/T326002
  • 21:24 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:23 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:17 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 21:12 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:12 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 21:00 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 20:54 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:54 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 20:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 20:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 20:46 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 20:41 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:40 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:38 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:32 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 20:12 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:12 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 19:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 19:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 19:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 19:41 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:41 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 ladsgroup@deploy2002: Finished scap: Backport for Add pc4 to the list of ParserCache clusters (T350367) (duration: 09m 32s)
  • 18:48 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy2002: ladsgroup: Backport for Add pc4 to the list of ParserCache clusters (T350367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:40 ladsgroup@deploy2002: Started scap: Backport for Add pc4 to the list of ParserCache clusters (T350367)
  • 18:39 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@048362b]: (no justification provided) (duration: 00m 29s)
  • 18:39 milimetric@deploy2002: Started deploy [airflow-dags/analytics@048362b]: (no justification provided)
  • 18:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:18 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 18:10 milimetric@deploy2002: Finished deploy [analytics/refinery@0239c23] (thin): Publishing refinery-source jars at 0.2.24 (duration: 00m 07s)
  • 18:09 milimetric@deploy2002: Started deploy [analytics/refinery@0239c23] (thin): Publishing refinery-source jars at 0.2.24
  • 18:04 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 milimetric@deploy2002: Finished deploy [analytics/refinery@0239c23]: Publishing refinery-source jars at 0.2.24 (duration: 07m 39s)
  • 18:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:02 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 17:56 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 17:55 milimetric@deploy2002: Started deploy [analytics/refinery@0239c23]: Publishing refinery-source jars at 0.2.24
  • 17:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:52 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:50 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 17:48 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:48 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:46 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:46 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 17:41 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 17:24 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2011.codfw.wmnet with OS bookworm
  • 17:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 17:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 17:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 17:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:03 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 17:01 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:55 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 34s)
  • 16:49 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:49 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 53s)
  • 16:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:49 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:48 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:48 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:45 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:44 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:43 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:43 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 16:41 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:41 ottomata: beginning deployments of eventgate clusters: mesh and cert chart updates, as well as sleep timeout values for graceful envoy+eventgate container termination - T349823 T300033 T346638
  • 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1014.eqiad.wmnet
  • 16:29 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 16:29 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:28 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bookworm
  • 16:26 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1014.eqiad.wmnet
  • 16:10 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1016.eqiad.wmnet with OS bookworm
  • 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2016.codfw.wmnet with OS bookworm
  • 16:02 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bookworm
  • 15:54 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 15:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:46 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 15:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Upgrade
  • 15:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Upgrade
  • 15:44 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:43 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1016.eqiad.wmnet with OS bookworm
  • 15:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2016.codfw.wmnet with OS bookworm
  • 15:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2015.codfw.wmnet with OS bookworm
  • 15:30 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bookworm
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1015.eqiad.wmnet with OS bookworm
  • 15:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2195.codfw.wmnet with OS bookworm
  • 15:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and A:cp
  • 15:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:19 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:18 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 15:17 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:17 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 15:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:10 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:09 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 15:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 15:09 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 15:08 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:08 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: rpkivalidator
  • 15:04 sukhe: finished upgrading all doh* hosts to dnsdist 1.8.2-1+wmf12u2 12
  • 15:01 urbanecm@deploy2002: Finished scap: Backport for Add autopatrol to Wikifunctions Staff group (T350028) (duration: 08m 41s)
  • 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:00 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bookworm
  • 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1015.eqiad.wmnet with OS bookworm
  • 14:56 urbanecm@deploy2002: urbanecm and mdsshakil: Continuing with sync
  • 14:56 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: rpkivalidator
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2015.codfw.wmnet with OS bookworm
  • 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idp
  • 14:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Upgrade
  • 14:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Upgrade
  • 14:55 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 14:54 urbanecm@deploy2002: urbanecm and mdsshakil: Backport for Add autopatrol to Wikifunctions Staff group (T350028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:53 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bookworm
  • 14:53 urbanecm@deploy2002: Started scap: Backport for Add autopatrol to Wikifunctions Staff group (T350028)
  • 14:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idp
  • 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::aux_k8s_etcd
  • 14:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:48 urbanecm@deploy2002: Finished scap: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544) (duration: 13m 13s)
  • 14:47 urbanecm: mwmaint2002: kill persistRevisionThreadItems.php maintenance script for s7 (T315510)
  • 14:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:42 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:42 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::aux_k8s_etcd
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aux_k8s::master
  • 14:37 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:36 urbanecm@deploy2002: urbanecm and tgr and matmarex: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 urbanecm@deploy2002: Started scap: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544)
  • 14:34 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aux_k8s::master
  • 14:34 urbanecm@deploy2002: Finished scap: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration (duration: 11m 34s)
  • 14:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aux_k8s::worker
  • 14:29 urbanecm@deploy2002: matmarex and urbanecm: Continuing with sync
  • 14:27 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aux_k8s::worker
  • 14:27 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:27 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin and A:cp
  • 14:25 vgutierrez: rolling upgrade of HAProxy to version 2.6.15-1~bpo11+1 in eqsin
  • 14:25 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bookworm
  • 14:24 urbanecm@deploy2002: matmarex and urbanecm: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:22 urbanecm@deploy2002: Started scap: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration
  • 14:22 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:22 urbanecm@deploy2002: Finished scap: Backport for CheckUser: Set 'debug' log level (T345591) (duration: 14m 20s)
  • 14:21 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2195.codfw.wmnet with OS bookworm
  • 14:20 stevemunene@cumin1001: END (ERROR) - Cookbook sre.druid.roll-restart-workers (exit_code=97) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:13 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 14:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:08 urbanecm@deploy2002: Started scap: Backport for CheckUser: Set 'debug' log level (T345591)
  • 13:57 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
  • 13:53 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bookworm
  • 13:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: cluster::management
  • 13:45 XioNoX: asw2-c-eqiad> request system power-off member 8 - T349798
  • 13:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 13:30 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:28 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:28 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:27 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:27 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:27 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:26 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:26 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:21 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:21 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:21 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:20 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1136.eqiad.wmnet onto db1236.eqiad.wmnet
  • 13:05 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idm
  • 12:48 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:48 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:44 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idm
  • 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idm_test
  • 12:36 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idm_test
  • 12:20 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2010.codfw.wmnet with OS bookworm
  • 12:14 moritzm: installing jetty9 security updates
  • 12:02 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 11:59 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 11:40 moritzm: installing openssl bugfix updates on Bullseye (update to 1.1.1w)
  • 11:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2193.codfw.wmnet with OS bookworm
  • 11:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 11:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 11:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 11:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-mariadb[1001-1002].eqiad.wmnet with reason: Commissioning new database servers
  • 11:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-mariadb[1001-1002].eqiad.wmnet with reason: Commissioning new database servers
  • 11:02 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bookworm
  • 11:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2192.codfw.wmnet with OS bookworm
  • 10:50 hashar: Restarting Jenkins
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 10:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 10:25 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2192.codfw.wmnet with OS bookworm
  • 10:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2191.codfw.wmnet with OS bookworm
  • 10:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53140 and previous config saved to /var/cache/conftool/dbconfig/20231106-100625-arnaudb.json
  • 10:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 10:05 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1136.eqiad.wmnet onto db1236.eqiad.wmnet
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53139 and previous config saved to /var/cache/conftool/dbconfig/20231106-100213-arnaudb.json
  • 09:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:48 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2191.codfw.wmnet with OS bookworm
  • 09:39 zabe@deploy2002: Finished scap: update interwiki cache (duration: 06m 21s)
  • 09:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2190.codfw.wmnet with OS bookworm
  • 09:35 moritzm: installing Tomcat security updates
  • 09:33 zabe@deploy2002: Started scap: update interwiki cache
  • 09:31 zabe@deploy2002: Finished scap: T350320 (duration: 06m 28s)
  • 09:26 zabe@deploy2002: zabe: Continuing with sync
  • 09:25 zabe@deploy2002: zabe: T350320 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:24 zabe@deploy2002: Started scap: T350320
  • 09:24 moritzm: installing openjdk-11 security updates
  • 09:24 zabe: Toba Batak Wikipedia # T350320
  • 09:22 zabe@deploy2002: Finished scap: T350218 (duration: 07m 04s)
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 09:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 09:17 zabe@deploy2002: zabe: Continuing with sync
  • 09:16 zabe@deploy2002: zabe: T350218 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:15 zabe@deploy2002: Started scap: T350218
  • 09:15 zabe: create Dagaare Wikipedia # T350218
  • 09:13 zabe@deploy2002: Finished scap: T350216 (duration: 07m 15s)
  • 09:10 moritzm: importing openjdk-8 8u392-ga-1~deb11u1 for bullseye-wikimedia to apt.wikimedia.org (latest Java 8 security fixes)
  • 09:08 zabe@deploy2002: zabe: Continuing with sync
  • 09:07 zabe@deploy2002: zabe: T350216 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:06 zabe@deploy2002: Started scap: T350216
  • 09:06 zabe: create Moroccan Amazigh Wikipedia # T350216
  • 09:04 zabe@deploy2002: Finished scap: T350217 (duration: 07m 47s)
  • 09:00 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bookworm
  • 08:58 zabe@deploy2002: zabe: Continuing with sync
  • 08:57 zabe@deploy2002: zabe: T350217 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:56 zabe@deploy2002: Started scap: T350217
  • 08:56 zabe: create Banjar Wikiquote # T350217
  • 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS bookworm
  • 08:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 08:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 08:31 godog: add +80G to prometheus/ops in eqiad
  • 08:25 urbanecm@deploy2002: Finished scap: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024) (duration: 23m 37s)
  • 08:15 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2189.codfw.wmnet with OS bookworm
  • 08:14 urbanecm@deploy2002: urbanecm: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:01 urbanecm@deploy2002: Started scap: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024)

2023-11-03

  • 19:15 cstone: payments-wiki upgraded from cf9f8e52 to 1d66a20f
  • 18:08 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2011.codfw.wmnet with OS bookworm
  • 17:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 29 days, 4:00:00 on cp4052.ulsfo.wmnet with reason: testing instance
  • 17:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 29 days, 4:00:00 on cp4052.ulsfo.wmnet with reason: testing instance
  • 16:46 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bookworm
  • 16:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 16:16 jynus@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup2010.codfw.wmnet with OS bookworm
  • 15:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2188.codfw.wmnet with OS bookworm
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 15:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 15:36 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 15:20 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2188.codfw.wmnet with OS bookworm
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2188.codfw.wmnet with reason: reimage via T343674
  • 15:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2188.codfw.wmnet with reason: reimage via T343674
  • 15:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:50 topranks: moving cr1-codfw <-> ssw1-a1-codfw EBGP session to private1-b-codfw IPs T347191
  • 14:40 topranks: adding irb interface in private1-a-codfw vlan to ssw1-a1-codfw T347191
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:02 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudelastic1005.wikimedia.org
  • 13:50 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudelastic1005.wikimedia.org
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 13:00 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bookworm
  • 12:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1006.eqiad.wmnet
  • 12:48 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1006.eqiad.wmnet
  • 12:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:42 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:17 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 12:17 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bookworm
  • 11:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bookworm
  • 11:49 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 11:49 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1010.eqiad.wmnet with OS bookworm
  • 11:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 11:30 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 11:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:21 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 11:08 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bookworm
  • 11:07 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1011.eqiad.wmnet with OS bookworm
  • 10:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 10:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 10:34 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bookworm
  • 10:08 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 09:59 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 09:59 Emperor: roll-restart swift frontends
  • 04:01 eileen: civicrm upgraded from 84ec2957 to 5be02f1b
  • 01:23 thcipriani@deploy2002: Finished scap: Backport for Disable namespaceDupes.php for now (T350443) (duration: 10m 29s)
  • 01:18 thcipriani@deploy2002: thcipriani: Continuing with sync
  • 01:14 thcipriani@deploy2002: thcipriani: Backport for Disable namespaceDupes.php for now (T350443) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:13 thcipriani@deploy2002: Started scap: Backport for Disable namespaceDupes.php for now (T350443)

2023-11-02

  • 22:31 Amir1: killed update collation on s5
  • 22:13 brett: import acme-chief 0.36-2 into bookworm-wikimedia repo
  • 21:22 inflatador: bking@cumin2002 enabling elastic snapshots on eqiad clusters T348686
  • 20:32 mabualruz@deploy2002: Finished scap: Backport for Enable native math rendering mode on testwiki (T311620) (duration: 14m 06s)
  • 20:27 mabualruz@deploy2002: mabualruz and physikerwelt: Continuing with sync
  • 20:20 mabualruz@deploy2002: mabualruz and physikerwelt: Backport for Enable native math rendering mode on testwiki (T311620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 mabualruz@deploy2002: Started scap: Backport for Enable native math rendering mode on testwiki (T311620)
  • 20:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for codfw CR IPs moved to new interfaces. - cmooney@cumin1001"
  • 20:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for codfw CR IPs moved to new interfaces. - cmooney@cumin1001"
  • 19:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bookworm
  • 18:46 topranks: shutting down uplink from asw-b-codfw et-2/0/51 to cr1-codfw in advance of cable move (T347191)
  • 18:44 topranks: Making cr2-codfw VRRP Master for row B traffic over new link from ssw1-a8-codfw (T347191)
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 18:22 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.3 refs T348356
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh1001.wikimedia.org with OS bookworm
  • 18:21 topranks: Shutting asw-b-codfw uplink to cr2-codfw down in advance of cable move (T347191)
  • 18:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 topranks: Making cr1-codfw VRRP Master for row A traffic again on ssw1-a1-codfw interface (T347191)
  • 17:50 topranks: Shutting asw-a-codfw uplink to cr1-codfw down in advance of cable move (T347191)
  • 17:45 topranks: Moving row A outbound traffic from direct CR link to routing via Spinie (T347191)
  • 17:45 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 17:42 vgutierrez: repool cp4051 and cp5030
  • 17:40 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:40 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:23 vgutierrez: depool cp5030
  • 17:19 vgutierrez: restart haproxy on cp4051
  • 17:14 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:14 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 17:13 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:13 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:12 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:11 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 17:11 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:10 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 17:06 topranks: shutting down uplink from asw-a-codfw et-7/0/52 to cr2-codfw et-1/0/0 (T347191)
  • 17:05 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Move row A/B CR uplinks to SPINE switches
  • 17:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Move row A/B CR uplinks to SPINE switches
  • 17:02 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 16:59 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:57 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 16:40 vgutierrez: depool cp4051
  • 16:35 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:35 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:31 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:30 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:30 ottomata: eventgate-analytics-external: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 16:30 ottomata: eventgate-analytics in codfw: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 16:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:26 fabfur: haproxy: this change https://gerrit.wikimedia.org/r/c/operations/puppet/+/971228 will be propagated soon to all cp-ulsfo hosts (T348851)
  • 16:07 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:06 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:51 ottomata: eventgate-analytics in eqiad: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 15:50 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:50 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:48 sukhe: sudo cumin 'O:prometheus' 'run-puppet-agent'
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 15:40 fabfur: cp4037 repooling with changes for dedicated healthcheck backend (haproxy): https://gerrit.wikimedia.org/r/c/operations/puppet/+/966221/ (T348851)
  • 15:34 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:27 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:26 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:17 fabfur: cp4037 depooled to be used as canary for https://gerrit.wikimedia.org/r/c/operations/puppet/+/966221/
  • 15:02 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 14:56 herron: logstash1025 systemctl restart apache2.service T350402
  • 14:51 sukhe: force agent run on A:wikidough
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::standalone
  • 14:35 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::standalone
  • 14:32 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: installserver
  • 14:32 hashar: Restarting CI Jenkins again for plugins removal
  • 14:15 hashar: Restarting CI Jenkins for plugins adjustements
  • 13:50 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: installserver
  • 13:43 jayme@deploy2002: Finished scap: upgrading ICU67 (duration: 15m 10s)
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host install6002.wikimedia.org
  • 13:34 sukhe: restart pybal on lvs1020
  • 13:29 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host install6002.wikimedia.org
  • 13:27 jayme@deploy2002: Started scap: upgrading ICU67
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netinsights
  • 13:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netinsights
  • 12:59 moritzm: upgrading deployment servers to ICU67 T345561
  • 12:46 jayme: running fleet wide php upgrades - T345561
  • 12:46 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: ganeti
  • 12:43 daniel@deploy2002: Finished scap: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001) (duration: 21m 37s)
  • 12:38 moritzm: upgrading snapshot* to ICU67 T345561
  • 12:37 daniel@deploy2002: daniel: Continuing with sync
  • 12:36 daniel@deploy2002: daniel: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:31 moritzm: upgrading cloudweb to ICU67 T345561
  • 12:21 daniel@deploy2002: Started scap: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001)
  • 12:20 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: ganeti
  • 11:58 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host netflow6001.drmrs.wmnet
  • 11:54 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 11:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:49 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 11:48 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host netflow6001.drmrs.wmnet
  • 11:46 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 11:45 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:33 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 11:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 11:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 11:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 11:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 11:12 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 11:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:10 vgutierrez: rolling upgrade of HAProxy to version 2.6.15-1~bpo11+1 in ulsfo
  • 11:09 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:00 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: testing new bgp policy
  • 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: testing new bgp policy
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ganeti2014.codfw.wmnet
  • 10:26 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 10:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ganeti2014.codfw.wmnet
  • 09:32 moritzm: installing openssl bugfix updates from Bullseye point release (update to 1.1.1w)
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 09:13 jayme: published image php7.4-fpm-multiversion-base:7.4.33-6 now based on icu67 php packages - T345561
  • 09:06 zabe@deploy2002: Finished scap: Backport for Update Netskope IP ranges (T350199) (duration: 07m 25s)
  • 09:05 moritzm: installing krb5 security updates on buster/bullseye/bookworm
  • 09:04 moritzm: installing krb5 security updates on bullseye
  • 09:01 zabe@deploy2002: zabe: Continuing with sync
  • 09:00 zabe@deploy2002: zabe: Backport for Update Netskope IP ranges (T350199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:59 zabe@deploy2002: Started scap: Backport for Update Netskope IP ranges (T350199)
  • 08:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: decomissionning via T348956
  • 08:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: decomissionning via T348956
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcontrol2006-dev.codfw.wmnet
  • 08:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcontrol2006-dev.codfw.wmnet
  • 08:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:05 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 07:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 03:13 eileen: civicrm upgraded from 86b620ef to 84ec2957
  • 03:03 eileen: civicrm upgraded from bcfd8a7e to 86b620ef
  • 02:27 eileen: civicrm upgraded from 60bdd8d3 to bcfd8a7e
  • 02:18 eileen: civicrm upgraded from 770b114c to 60bdd8d3

2023-11-01

  • 22:39 urbanecm@deploy2002: Finished scap: Backport for Revert "Add খসড়া as draft namespace alias on bnwiki" and add "খসড়া" by copy-paste from wiki page (duration: 06m 44s)
  • 22:32 urbanecm@deploy2002: Started scap: Backport for Revert "Add খসড়া as draft namespace alias on bnwiki" and add "খসড়া" by copy-paste from wiki page
  • 22:18 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 22:18 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:18 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:16 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:15 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:15 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:13 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:06 urbanecm@deploy2002: Finished scap: Backport for Add খসড়া as draft namespace alias on bnwiki (duration: 09m 34s)
  • 22:00 urbanecm@deploy2002: mdsshakil and urbanecm: Continuing with sync
  • 21:57 urbanecm@deploy2002: mdsshakil and urbanecm: Backport for Add খসড়া as draft namespace alias on bnwiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:56 urbanecm@deploy2002: Started scap: Backport for Add খসড়া as draft namespace alias on bnwiki
  • 21:14 topranks: adjust BGP policy out to L3 switches on remaining CRs T344547
  • 20:51 urbanecm: mwmaint2002: mwscript namespaceDupes.php bnwiki --fix --add-prefix BROKEN
  • 20:49 topranks: configure esams switches to load-share default across CRs T344547
  • 20:23 cjming: end of UTC late backport window
  • 20:23 topranks: adjusting routes announced to L3 switches in esams T344547
  • 20:21 cjming@deploy2002: Finished scap: Backport for Create Draft namespace on bnwiki (T350133) (duration: 13m 38s)
  • 20:15 cjming@deploy2002: mdsshakil and cjming: Continuing with sync
  • 20:08 cjming@deploy2002: mdsshakil and cjming: Backport for Create Draft namespace on bnwiki (T350133) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:07 cjming@deploy2002: Started scap: Backport for Create Draft namespace on bnwiki (T350133)
  • 19:44 topranks: adjusting routes announced to L3 switches in codfw T344547
  • 18:17 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 18:16 sukhe: upgrade doh4001 to dnsdist 1.8.2-1+wmf12u2
  • 18:15 dduvall@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.3 refs T348356 (duration: 05m 39s)
  • 18:14 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.2-1+wmf12u2_amd64.changes
  • 18:10 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.3 refs T348356
  • 18:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 17:41 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt-wdqs1002.eqiad.wmnet
  • 17:41 taavi@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudvirt-wdqs1002.eqiad.wmnet
  • 17:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 17:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 17:27 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:21 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:21 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: still setting up
  • 17:21 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: still setting up
  • 17:20 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 17:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 17:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 17:14 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:14 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 17:12 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 17:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 17:01 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp1114.eqiad.wmnet with OS bullseye
  • 16:59 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:57 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:54 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-f1-eqiad.mgmt,ssw1-e1-eqiad.mgmt with reason: replacing optics to troubleshoot errors on core switch link
  • 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-f1-eqiad.mgmt,ssw1-e1-eqiad.mgmt with reason: replacing optics to troubleshoot errors on core switch link
  • 16:53 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:52 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 16:40 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:40 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 16:30 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:25 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:25 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:18 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:18 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 16:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: moving switch link from NIC port 2 to port 1
  • 16:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: moving switch link from NIC port 2 to port 1
  • 16:03 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 15:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp1113.eqiad.wmnet with OS bullseye
  • 15:57 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:56 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:26 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1113.eqiad.wmnet with OS bookworm
  • 15:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bookworm
  • 15:23 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 15:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 15:05 urbanecm: mwmaint2002: mwscript userOptions.php --wiki=WIKI --nowarn --old='oldimpact' --new='control' 'growthexperiments-homepage-variant' # end A/B testing of new Impact (T336203; wikis=arwiki bnwiki elwiki eswiki fawiki frwiki frwiktionary idwiki plwiki rowiki trwiki viwiki)
  • 15:02 urbanecm@deploy2002: Finished scap: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203) (duration: 09m 44s)
  • 15:00 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:59 urbanecm: mwmaint2002: mwscript userOptions.php --wiki=cswiki --nowarn --old='oldimpact' --new='control' 'growthexperiments-homepage-variant' # end A/B testing of new Impact (T336203)
  • 14:57 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:57 urbanecm@deploy2002: urbanecm: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:55 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:52 urbanecm@deploy2002: Started scap: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203)
  • 14:52 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable new Impact module on all Wikipedias (T336203) (duration: 10m 41s)
  • 14:49 ejegg: fundraising python tools upgraded from 65f101e4 to a4cbbbe7
  • 14:46 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:45 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable new Impact module on all Wikipedias (T336203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 urbanecm@deploy2002: Started scap: Backport for Growth: Enable new Impact module on all Wikipedias (T336203)
  • 14:15 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:15 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:14 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:14 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 14:06 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 13:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp1100.eqiad.wmnet with reason: not pooled, reimaging in progress
  • 13:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp1100.eqiad.wmnet with reason: not pooled, reimaging in progress
  • 13:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:52 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:33 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:32 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:29 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:25 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:22 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:20 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 13:11 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 13:09 moritzm: installing libx11 security updates
  • 13:01 moritzm: installing glib2.0 security updates
  • 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 12:43 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 12:31 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration in s4 to write both (T345732) (duration: 09m 12s)
  • 12:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:23 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration in s4 to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:22 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration in s4 to write both (T345732)
  • 12:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:33 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:22 moritzm: installing adduser security updates
  • 10:10 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 10:03 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 10:02 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 09:57 moritzm: installing yajl security updates
  • 09:46 moritzm: installing ncurses security updates
  • 09:28 moritzm: installing RT security updates
  • 09:11 moritzm: installing curl security updates
  • 08:34 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 06:01 kart_: Updated MinT to 2023-10-31-044726-production (T333969, T349991, T349079, T340507)
  • 05:57 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:51 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:46 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:40 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:32 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:29 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 00:51 eileen: civicrm upgraded from 31d53b57 to 6ae3d3fc
  • 00:01 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye

Other archives

2000s

2010s

2020s