Jump to content

Server Admin Log/Archive 70

From Wikitech

2023-08-31

  • 23:53 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 23:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 23:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 22:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bookworm
  • 21:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 21:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 21:25 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2037.codfw.wmnet with OS bullseye
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2003.codfw.wmnet
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 21:07 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 21:04 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:01 jhuneidi@deploy1002: Finished scap: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365) (duration: 10m 03s)
  • 21:00 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2003.codfw.wmnet
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 20:55 jhuneidi@deploy1002: arlolra and jhuneidi: Continuing with sync
  • 20:52 jhuneidi@deploy1002: arlolra and jhuneidi: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2001.codfw.wmnet
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:51 jhuneidi@deploy1002: Started scap: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365)
  • 20:50 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:47 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh5002.wikimedia.org with OS bookworm
  • 20:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh6002.wikimedia.org
  • 20:45 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh6002.wikimedia.org
  • 20:43 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2001.codfw.wmnet
  • 20:43 jhuneidi@deploy1002: Finished scap: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031) (duration: 07m 01s)
  • 20:37 jhuneidi@deploy1002: jhuneidi and matmarex: Continuing with sync
  • 20:37 jhuneidi@deploy1002: jhuneidi and matmarex: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:36 jhuneidi@deploy1002: Started scap: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031)
  • 20:34 jhuneidi@deploy1002: Finished scap: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158) (duration: 14m 19s)
  • 20:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bookworm
  • 20:29 jhuneidi@deploy1002: jhuneidi and dani: Continuing with sync
  • 20:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:27 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:27 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:26 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:26 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:25 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:21 jhuneidi@deploy1002: jhuneidi and dani: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:20 jhuneidi@deploy1002: Started scap: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158)
  • 20:18 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:17 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:16 inflatador: 'bking@wdqs1004 depool wdqs1004 to test script changes T342361'
  • 20:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1005.eqiad.wmnet
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 20:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:11 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 20:07 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 20:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 19:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 19:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh6002.wikimedia.org with OS bookworm
  • 19:44 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 19:33 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - cmooney@cumin1001"
  • 19:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:28 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1005.eqiad.wmnet
  • 19:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 19:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 19:03 ryankemper: T344198 on `ryankemper@cumin1001`: `sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T344198"'`
  • 19:03 ryankemper: T344198 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
  • 18:56 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.24 refs T343726
  • 17:12 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 17:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:03 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:42 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host restbase1030.eqiad.wmnet
  • 16:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - cmooney@cumin1001"
  • 16:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - cmooney@cumin1001"
  • 16:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52236 and previous config saved to /var/cache/conftool/dbconfig/20230831-161736-ladsgroup.json
  • 16:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52235 and previous config saved to /var/cache/conftool/dbconfig/20230831-160230-ladsgroup.json
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 15:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cloudservices1006.eqiad.wmnet with reason: service bootstrap
  • 15:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on cloudservices1006.eqiad.wmnet with reason: service bootstrap
  • 15:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 15:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T336380)
  • 15:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 15:48 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T336380)
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52234 and previous config saved to /var/cache/conftool/dbconfig/20230831-154724-ladsgroup.json
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 15:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T336380)
  • 15:44 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T336380)
  • 15:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 15:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 15:39 moritzm: failover ganeti master in ulsfo to ganeti4005
  • 15:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 15:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52233 and previous config saved to /var/cache/conftool/dbconfig/20230831-153217-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52232 and previous config saved to /var/cache/conftool/dbconfig/20230831-153005-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52231 and previous config saved to /var/cache/conftool/dbconfig/20230831-152943-ladsgroup.json
  • 15:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52230 and previous config saved to /var/cache/conftool/dbconfig/20230831-152710-root.json
  • 15:24 jynus: extend backup1009 lv by additional 10TiB
  • 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 15:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 15:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52229 and previous config saved to /var/cache/conftool/dbconfig/20230831-151437-ladsgroup.json
  • 15:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1010.eqiad.wmnet with OS bullseye
  • 15:12 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1001"
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52228 and previous config saved to /var/cache/conftool/dbconfig/20230831-151205-root.json
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 15:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 15:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52227 and previous config saved to /var/cache/conftool/dbconfig/20230831-145931-ladsgroup.json
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52226 and previous config saved to /var/cache/conftool/dbconfig/20230831-145700-root.json
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 14:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 14:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52225 and previous config saved to /var/cache/conftool/dbconfig/20230831-144425-ladsgroup.json
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52224 and previous config saved to /var/cache/conftool/dbconfig/20230831-144155-root.json
  • 14:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testreduce1002.eqiad.wmnet
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testreduce1002.eqiad.wmnet with OS bookworm
  • 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 14:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52223 and previous config saved to /var/cache/conftool/dbconfig/20230831-142651-root.json
  • 14:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52222 and previous config saved to /var/cache/conftool/dbconfig/20230831-142445-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52221 and previous config saved to /var/cache/conftool/dbconfig/20230831-142424-ladsgroup.json
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 14:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 14:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bookworm
  • 14:16 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:16 sgimeno@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139) (duration: 07m 34s)
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52220 and previous config saved to /var/cache/conftool/dbconfig/20230831-141547-ladsgroup.json
  • 14:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 14:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52219 and previous config saved to /var/cache/conftool/dbconfig/20230831-141146-root.json
  • 14:10 sgimeno@deploy1002: sgimeno: Continuing with sync
  • 14:10 sgimeno@deploy1002: sgimeno: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52217 and previous config saved to /var/cache/conftool/dbconfig/20230831-140917-ladsgroup.json
  • 14:09 sgimeno@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)
  • 14:07 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 14:07 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 07m 36s)
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 14:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 14:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 14:05 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1001"
  • 14:01 sgimeno@deploy1002: sgimeno and soda: Continuing with sync
  • 14:01 sgimeno@deploy1002: sgimeno and soda: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52216 and previous config saved to /var/cache/conftool/dbconfig/20230831-140041-ladsgroup.json
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testreduce1002.eqiad.wmnet with OS bookworm
  • 13:59 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testreduce1002.eqiad.wmnet on all recursors
  • 13:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache testreduce1002.eqiad.wmnet on all recursors
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52215 and previous config saved to /var/cache/conftool/dbconfig/20230831-135641-root.json
  • 13:56 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 09m 33s)
  • 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52214 and previous config saved to /var/cache/conftool/dbconfig/20230831-135411-ladsgroup.json
  • 13:54 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 13:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:53 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testreduce1002.eqiad.wmnet
  • 13:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
  • 13:49 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
  • 13:49 sgimeno@deploy1002: soda and sgimeno: Continuing with sync
  • 13:48 sgimeno@deploy1002: soda and sgimeno: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 13:47 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 13:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52213 and previous config saved to /var/cache/conftool/dbconfig/20230831-134535-ladsgroup.json
  • 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 13:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
  • 13:42 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 15m 00s)
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52212 and previous config saved to /var/cache/conftool/dbconfig/20230831-134136-root.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52211 and previous config saved to /var/cache/conftool/dbconfig/20230831-133905-ladsgroup.json
  • 13:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 13:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 13:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 13:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: host reimage
  • 13:36 jbond: swap puppetdb-api and puppetdb-api-next gerrit:940384
  • 13:35 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device ssw1-f1-eqiad
  • 13:35 sgimeno@deploy1002: sgimeno and soda: Continuing with sync
  • 13:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 13:35 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
  • 13:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 13:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
  • 13:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh2002.wikimedia.org with OS bookworm
  • 13:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: host reimage
  • 13:31 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
  • 13:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52210 and previous config saved to /var/cache/conftool/dbconfig/20230831-133029-ladsgroup.json
  • 13:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 13:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:29 sgimeno@deploy1002: sgimeno and soda: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52209 and previous config saved to /var/cache/conftool/dbconfig/20230831-132820-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1119.eqiad.wmnet
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52208 and previous config saved to /var/cache/conftool/dbconfig/20230831-132759-ladsgroup.json
  • 13:27 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:25 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 13:25 sgimeno@deploy1002: Finished scap: Backport for Remove rc1.mediawiki.page_content_change stream (T307959) (duration: 10m 33s)
  • 13:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 13:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 13:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 13:23 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 13:20 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1010.eqiad.wmnet with OS bullseye
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52207 and previous config saved to /var/cache/conftool/dbconfig/20230831-132009-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52206 and previous config saved to /var/cache/conftool/dbconfig/20230831-131947-ladsgroup.json
  • 13:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 13:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 13:17 sgimeno@deploy1002: gmodena and sgimeno: Continuing with sync
  • 13:16 sgimeno@deploy1002: gmodena and sgimeno: Backport for Remove rc1.mediawiki.page_content_change stream (T307959) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:14 sgimeno@deploy1002: Started scap: Backport for Remove rc1.mediawiki.page_content_change stream (T307959)
  • 13:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:13 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52205 and previous config saved to /var/cache/conftool/dbconfig/20230831-131252-ladsgroup.json
  • 13:12 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 13:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52204 and previous config saved to /var/cache/conftool/dbconfig/20230831-130441-ladsgroup.json
  • 13:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
  • 13:02 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 13:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52203 and previous config saved to /var/cache/conftool/dbconfig/20230831-125746-ladsgroup.json
  • 12:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 12:54 lucaswerkmeister-wmde: Deployed security patch for T345064
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - cmooney@cumin1001"
  • 12:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 12:53 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - cmooney@cumin1001"
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52202 and previous config saved to /var/cache/conftool/dbconfig/20230831-124934-ladsgroup.json
  • 12:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:49 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:49 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw
  • 12:49 cmooney@cumin1001: START - Cookbook sre.network.tls for network device ssw1-a1-codfw
  • 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 12:47 lucaswerkmeister-wmde: Deployed security patch for T345064
  • 12:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 12:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52201 and previous config saved to /var/cache/conftool/dbconfig/20230831-124240-ladsgroup.json
  • 12:42 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 12:36 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw
  • 12:35 cmooney@cumin1001: START - Cookbook sre.network.tls for network device ssw1-a8-codfw
  • 12:35 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@06203c0] (duration: 03m 07s)
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52200 and previous config saved to /var/cache/conftool/dbconfig/20230831-123428-ladsgroup.json
  • 12:32 aqu@deploy1002: Started deploy [analytics/refinery@06203c0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@06203c0]
  • 12:32 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0] (thin): Regular analytics weekly train THIN [analytics/refinery@06203c0] (duration: 00m 04s)
  • 12:32 aqu@deploy1002: Started deploy [analytics/refinery@06203c0] (thin): Regular analytics weekly train THIN [analytics/refinery@06203c0]
  • 12:31 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308) (duration: 27m 05s)
  • 12:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 12:27 jayme: restarting pybal on lvs1019 - T325178
  • 12:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 12:26 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52199 and previous config saved to /var/cache/conftool/dbconfig/20230831-122502-ladsgroup.json
  • 12:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 jayme: restarting pybal on lvs1020 - T325178
  • 12:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52198 and previous config saved to /var/cache/conftool/dbconfig/20230831-122441-ladsgroup.json
  • 12:23 ladsgroup@deploy1002: isaranto and ladsgroup: Continuing with sync
  • 12:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:21 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:19 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:18 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 12:18 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52197 and previous config saved to /var/cache/conftool/dbconfig/20230831-121721-ladsgroup.json
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52196 and previous config saved to /var/cache/conftool/dbconfig/20230831-121654-ladsgroup.json
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:15 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0]: Regular analytics weekly train [analytics/refinery@06203c0] (duration: 12m 15s)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:11 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:11 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a8-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52195 and previous config saved to /var/cache/conftool/dbconfig/20230831-120935-ladsgroup.json
  • 12:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:05 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:03 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308)
  • 12:03 aqu@deploy1002: Started deploy [analytics/refinery@06203c0]: Regular analytics weekly train [analytics/refinery@06203c0]
  • 12:02 aqu: About to deploy analytics refinery (weekly train)
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52194 and previous config saved to /var/cache/conftool/dbconfig/20230831-120148-ladsgroup.json
  • 11:59 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster1002.eqiad.wmnet
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52193 and previous config saved to /var/cache/conftool/dbconfig/20230831-115429-ladsgroup.json
  • 11:48 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 11:46 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1119.eqiad.wmnet
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52192 and previous config saved to /var/cache/conftool/dbconfig/20230831-114642-ladsgroup.json
  • 11:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:46 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 11:44 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster1001.eqiad.wmnet
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52191 and previous config saved to /var/cache/conftool/dbconfig/20230831-113922-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52190 and previous config saved to /var/cache/conftool/dbconfig/20230831-113613-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52189 and previous config saved to /var/cache/conftool/dbconfig/20230831-113603-ladsgroup.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P52187 and previous config saved to /var/cache/conftool/dbconfig/20230831-113324-root.json
  • 11:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52186 and previous config saved to /var/cache/conftool/dbconfig/20230831-113136-ladsgroup.json
  • 11:31 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52185 and previous config saved to /var/cache/conftool/dbconfig/20230831-112057-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52184 and previous config saved to /var/cache/conftool/dbconfig/20230831-111353-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343718)', diff saved to https://phabricator.wikimedia.org/P52183 and previous config saved to /var/cache/conftool/dbconfig/20230831-111332-ladsgroup.json
  • 11:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 11:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1025.eqiad.wmnet
  • 11:06 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes1025.eqiad.wmnet
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52182 and previous config saved to /var/cache/conftool/dbconfig/20230831-110551-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52181 and previous config saved to /var/cache/conftool/dbconfig/20230831-105826-ladsgroup.json
  • 10:54 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1006.eqiad.wmnet
  • 10:54 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52180 and previous config saved to /var/cache/conftool/dbconfig/20230831-105044-ladsgroup.json
  • 10:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:50 moritzm: installing flask security updates on buster
  • 10:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52179 and previous config saved to /var/cache/conftool/dbconfig/20230831-104836-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52178 and previous config saved to /var/cache/conftool/dbconfig/20230831-104815-ladsgroup.json
  • 10:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:47 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@90f280e]: (no justification provided) (duration: 00m 09s)
  • 10:47 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@90f280e]: (no justification provided)
  • 10:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:46 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:46 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52177 and previous config saved to /var/cache/conftool/dbconfig/20230831-104319-ladsgroup.json
  • 10:43 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 10:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:41 moritzm: installing cjose security updates
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Maintenance
  • 10:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 10:37 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 10:34 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52176 and previous config saved to /var/cache/conftool/dbconfig/20230831-103308-ladsgroup.json
  • 10:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
  • 10:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:22 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52174 and previous config saved to /var/cache/conftool/dbconfig/20230831-101802-ladsgroup.json
  • 10:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 10:16 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-drmrs
  • 10:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 10:15 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
  • 10:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 10:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 10:08 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T343718)', diff saved to https://phabricator.wikimedia.org/P52173 and previous config saved to /var/cache/conftool/dbconfig/20230831-100811-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52172 and previous config saved to /var/cache/conftool/dbconfig/20230831-100750-ladsgroup.json
  • 10:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52171 and previous config saved to /var/cache/conftool/dbconfig/20230831-100256-ladsgroup.json
  • 10:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-codfw
  • 10:00 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52170 and previous config saved to /var/cache/conftool/dbconfig/20230831-100047-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52169 and previous config saved to /var/cache/conftool/dbconfig/20230831-100026-ladsgroup.json
  • 09:59 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
  • 09:57 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-codfw
  • 09:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
  • 09:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52168 and previous config saved to /var/cache/conftool/dbconfig/20230831-095244-ladsgroup.json
  • 09:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:51 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 09:51 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: sync
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: sync
  • 09:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52167 and previous config saved to /var/cache/conftool/dbconfig/20230831-094520-ladsgroup.json
  • 09:45 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1017.eqiad.wmnet
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52166 and previous config saved to /var/cache/conftool/dbconfig/20230831-093738-ladsgroup.json
  • 09:37 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: fix arwiki likelybad threshold (T345305) (duration: 68m 57s)
  • 09:35 moritzm: imported cas 6.6.11+wmf11u1 to apt.wikimedia.org
  • 09:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1017.eqiad.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52165 and previous config saved to /var/cache/conftool/dbconfig/20230831-093013-ladsgroup.json
  • 09:24 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1016.eqiad.wmnet
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52164 and previous config saved to /var/cache/conftool/dbconfig/20230831-092231-ladsgroup.json
  • 09:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-eqsin
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52163 and previous config saved to /var/cache/conftool/dbconfig/20230831-091507-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52162 and previous config saved to /var/cache/conftool/dbconfig/20230831-091258-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52161 and previous config saved to /var/cache/conftool/dbconfig/20230831-091237-ladsgroup.json
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-eqsin
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-esams
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-esams
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin
  • 09:03 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1015.eqiad.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52160 and previous config saved to /var/cache/conftool/dbconfig/20230831-090244-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52159 and previous config saved to /var/cache/conftool/dbconfig/20230831-090223-ladsgroup.json
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqsin
  • 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqord
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52158 and previous config saved to /var/cache/conftool/dbconfig/20230831-085731-ladsgroup.json
  • 08:56 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqord
  • 08:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqiad
  • 08:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:56 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqiad
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqdfw
  • 08:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:50 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1014.eqiad.wmnet
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqdfw
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-drmrs
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52157 and previous config saved to /var/cache/conftool/dbconfig/20230831-084717-ladsgroup.json
  • 08:43 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-drmrs
  • 08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-codfw
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52156 and previous config saved to /var/cache/conftool/dbconfig/20230831-084224-ladsgroup.json
  • 08:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 08:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 08:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
  • 08:38 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-codfw
  • 08:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-esams
  • 08:36 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 08:36 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
  • 08:36 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: fix arwiki likelybad threshold (T345305) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:33 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-esams
  • 08:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-eqiad
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52155 and previous config saved to /var/cache/conftool/dbconfig/20230831-083211-ladsgroup.json
  • 08:30 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
  • 08:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
  • 08:28 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-eqiad
  • 08:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr4-ulsfo
  • 08:28 ladsgroup@deploy1002: Started scap: Backport for ores-extension: fix arwiki likelybad threshold (T345305)
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52154 and previous config saved to /var/cache/conftool/dbconfig/20230831-082717-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52153 and previous config saved to /var/cache/conftool/dbconfig/20230831-082508-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52152 and previous config saved to /var/cache/conftool/dbconfig/20230831-082440-ladsgroup.json
  • 08:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr4-ulsfo
  • 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-ulsfo
  • 08:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
  • 08:23 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
  • 08:21 vgutierrez: set send_timeout to 3620s in the upload cluster via cumin to avoid a varnish restart https://gerrit.wikimedia.org/r/c/operations/puppet/+/953678 - T341755
  • 08:19 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-ulsfo
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52151 and previous config saved to /var/cache/conftool/dbconfig/20230831-081705-ladsgroup.json
  • 08:15 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
  • 08:15 ladsgroup@deploy1002: Finished scap: Backport for Disable user creation on wikitech (T345226) (duration: 10m 06s)
  • 08:14 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52150 and previous config saved to /var/cache/conftool/dbconfig/20230831-080934-ladsgroup.json
  • 08:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
  • 08:06 ladsgroup@deploy1002: ladsgroup and andrew: Continuing with sync
  • 08:06 ladsgroup@deploy1002: ladsgroup and andrew: Backport for Disable user creation on wikitech (T345226) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:04 ladsgroup@deploy1002: Started scap: Backport for Disable user creation on wikitech (T345226)
  • 08:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:03 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1009.eqiad.wmnet
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52149 and previous config saved to /var/cache/conftool/dbconfig/20230831-075709-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52148 and previous config saved to /var/cache/conftool/dbconfig/20230831-075428-ladsgroup.json
  • 07:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
  • 07:51 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52147 and previous config saved to /var/cache/conftool/dbconfig/20230831-073921-ladsgroup.json
  • 07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:37 apergos: UTC morning backport and config window done
  • 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52146 and previous config saved to /var/cache/conftool/dbconfig/20230831-073713-ladsgroup.json
  • 07:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52145 and previous config saved to /var/cache/conftool/dbconfig/20230831-073115-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52144 and previous config saved to /var/cache/conftool/dbconfig/20230831-072848-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52143 and previous config saved to /var/cache/conftool/dbconfig/20230831-071610-root.json
  • 07:15 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service for testwiki (duration: 10m 18s)
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52142 and previous config saved to /var/cache/conftool/dbconfig/20230831-071343-root.json
  • 07:09 kartik@deploy1002: abi and kartik: Continuing with sync
  • 07:07 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service for testwiki synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:05 kartik@deploy1002: Started scap: Backport for Enable MinT translation service for testwiki
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52141 and previous config saved to /var/cache/conftool/dbconfig/20230831-070105-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52140 and previous config saved to /var/cache/conftool/dbconfig/20230831-065838-root.json
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52139 and previous config saved to /var/cache/conftool/dbconfig/20230831-064601-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52138 and previous config saved to /var/cache/conftool/dbconfig/20230831-064333-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52137 and previous config saved to /var/cache/conftool/dbconfig/20230831-063056-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52136 and previous config saved to /var/cache/conftool/dbconfig/20230831-062829-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52135 and previous config saved to /var/cache/conftool/dbconfig/20230831-061551-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52134 and previous config saved to /var/cache/conftool/dbconfig/20230831-061324-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52133 and previous config saved to /var/cache/conftool/dbconfig/20230831-060047-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52132 and previous config saved to /var/cache/conftool/dbconfig/20230831-055819-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52131 and previous config saved to /var/cache/conftool/dbconfig/20230831-054805-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52130 and previous config saved to /var/cache/conftool/dbconfig/20230831-054542-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52129 and previous config saved to /var/cache/conftool/dbconfig/20230831-054314-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 T344309', diff saved to https://phabricator.wikimedia.org/P52128 and previous config saved to /var/cache/conftool/dbconfig/20230831-054305-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52127 and previous config saved to /var/cache/conftool/dbconfig/20230831-053300-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T345223', diff saved to https://phabricator.wikimedia.org/P52126 and previous config saved to /var/cache/conftool/dbconfig/20230831-053035-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T345223', diff saved to https://phabricator.wikimedia.org/P52125 and previous config saved to /var/cache/conftool/dbconfig/20230831-052852-marostegui.json
  • 05:28 marostegui: Starting s6 eqiad failover from db1131 to db1173 - T345223
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52123 and previous config saved to /var/cache/conftool/dbconfig/20230831-051755-root.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52122 and previous config saved to /var/cache/conftool/dbconfig/20230831-050250-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T345223', diff saved to https://phabricator.wikimedia.org/P52121 and previous config saved to /var/cache/conftool/dbconfig/20230831-045719-marostegui.json
  • 04:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T345223
  • 04:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T345223
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52120 and previous config saved to /var/cache/conftool/dbconfig/20230831-044746-root.json
  • 02:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 02:26 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 01:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 01:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 00:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 00:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']

2023-08-30

  • 22:28 krinkle@deploy1002: Synchronized php-1.41.0-wmf.24/extensions/WikimediaEvents/: 697ab03 (duration: 06m 26s)
  • 22:09 krinkle@deploy1002: Finished scap: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944) (duration: 35m 08s)
  • 21:57 krinkle@deploy1002: krinkle: Continuing with sync
  • 21:55 krinkle@deploy1002: krinkle: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:34 krinkle@deploy1002: Started scap: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944)
  • 21:31 Krinkle: krinkle@deploy1002: running `sudo /usr/local/sbin/fix-staging-perms` two fix permissions under /srv/patches/1.41.0-wmf.24 where 2 of the 3 patch files are read-only by jnuche:deployment
  • 20:44 hmonroy@deploy1002: Finished scap: Backport for Add comment about mirroring of wgMobileUrlTemplate (T344185) (duration: 07m 11s)
  • 20:37 hmonroy@deploy1002: Started scap: Backport for Add comment about mirroring of wgMobileUrlTemplate (T344185)
  • 20:33 hmonroy@deploy1002: Finished scap: Backport for Omit 'target' in the body of review REST API requests (duration: 08m 18s)
  • 20:27 hmonroy@deploy1002: matmarex and hmonroy: Continuing with sync
  • 20:26 hmonroy@deploy1002: matmarex and hmonroy: Backport for Omit 'target' in the body of review REST API requests synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:24 hmonroy@deploy1002: Started scap: Backport for Omit 'target' in the body of review REST API requests
  • 20:14 hmonroy@deploy1002: Finished scap: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754) (duration: 09m 13s)
  • 20:08 hmonroy@deploy1002: hmonroy: Continuing with sync
  • 20:06 hmonroy@deploy1002: hmonroy: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:05 hmonroy@deploy1002: Started scap: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754)
  • 19:19 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 19:17 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts wdqs1005.eqiad.wmnet
  • 18:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 18:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 18:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 18:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:28 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1005.eqiad.wmnet
  • 18:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bookworm
  • 18:15 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.24 refs T343726 (duration: 06m 13s)
  • 18:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.24 refs T343726
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 18:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:03 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 17:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh2001.wikimedia.org with OS bookworm
  • 16:36 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: replace thresholds with numeric values (T343308) (duration: 10m 09s)
  • 16:30 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 16:28 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: replace thresholds with numeric values (T343308) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:26 ladsgroup@deploy1002: Started scap: Backport for ores-extension: replace thresholds with numeric values (T343308)
  • 16:19 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 16:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 15:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bookworm
  • 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52113 and previous config saved to /var/cache/conftool/dbconfig/20230830-154437-ladsgroup.json
  • 15:44 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster2002.codfw.wmnet
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201', diff saved to https://phabricator.wikimedia.org/P52112 and previous config saved to /var/cache/conftool/dbconfig/20230830-153915-root.json
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P52111 and previous config saved to /var/cache/conftool/dbconfig/20230830-152931-ladsgroup.json
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 15:24 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b13-drmrs
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b13-drmrs
  • 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b12-drmrs
  • 15:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b12-drmrs
  • 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-bw27-esams
  • 15:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-bw27-esams
  • 15:19 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P52110 and previous config saved to /var/cache/conftool/dbconfig/20230830-151424-ladsgroup.json
  • 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-by27-esams
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-by27-esams
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh4001.wikimedia.org with OS bookworm
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52109 and previous config saved to /var/cache/conftool/dbconfig/20230830-150709-ladsgroup.json
  • 15:00 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52108 and previous config saved to /var/cache/conftool/dbconfig/20230830-145918-ladsgroup.json
  • 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "POP switches - ayounsi@cumin1001"
  • 14:55 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "POP switches - ayounsi@cumin1001"
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52107 and previous config saved to /var/cache/conftool/dbconfig/20230830-145457-ladsgroup.json
  • 14:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52106 and previous config saved to /var/cache/conftool/dbconfig/20230830-145205-ladsgroup.json
  • 14:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:49 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:49 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 14:41 fab@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:40 jynus: disable bacula backup1002, backup2002 jobs
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52105 and previous config saved to /var/cache/conftool/dbconfig/20230830-143950-ladsgroup.json
  • 14:39 fab@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:39 fab@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:37 Amir1: dbmaint on s4@codfw (T207253)
  • 14:37 fab@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52104 and previous config saved to /var/cache/conftool/dbconfig/20230830-143700-ladsgroup.json
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 14:34 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:33 fab@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:30 fab@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52103 and previous config saved to /var/cache/conftool/dbconfig/20230830-142737-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52102 and previous config saved to /var/cache/conftool/dbconfig/20230830-142716-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52101 and previous config saved to /var/cache/conftool/dbconfig/20230830-142444-ladsgroup.json
  • 14:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bookworm
  • 14:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52100 and previous config saved to /var/cache/conftool/dbconfig/20230830-142155-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P52099 and previous config saved to /var/cache/conftool/dbconfig/20230830-141210-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52098 and previous config saved to /var/cache/conftool/dbconfig/20230830-140938-ladsgroup.json
  • 14:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P52097 and previous config saved to /var/cache/conftool/dbconfig/20230830-135704-ladsgroup.json
  • 13:49 topranks: disabling DHCP snooping on mr1-codfw to test ztp operation
  • 13:47 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
  • 13:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 13:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2038']
  • 13:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52096 and previous config saved to /var/cache/conftool/dbconfig/20230830-134232-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52095 and previous config saved to /var/cache/conftool/dbconfig/20230830-134209-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52094 and previous config saved to /var/cache/conftool/dbconfig/20230830-134157-ladsgroup.json
  • 13:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 13:41 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 13:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52093 and previous config saved to /var/cache/conftool/dbconfig/20230830-133745-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2027']
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52092 and previous config saved to /var/cache/conftool/dbconfig/20230830-133724-ladsgroup.json
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2026']
  • 13:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2026']
  • 13:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh6001.wikimedia.org with OS bookworm
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2039']
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2038']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2037']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2035']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2034']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2033']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2036']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2032']
  • 13:33 moritzm: failover ganeti master in eqiad to ganeti1027
  • 13:32 taavi@deploy1002: Finished scap: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown (duration: 08m 10s)
  • 13:28 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:27 taavi@deploy1002: taavi: Continuing with sync
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52091 and previous config saved to /var/cache/conftool/dbconfig/20230830-132703-ladsgroup.json
  • 13:25 taavi@deploy1002: taavi: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2037']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
  • 13:24 taavi@deploy1002: Started scap: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2035']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2034']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2033']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2032']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2031']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2030']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2029']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2028']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2027']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2026']
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P52090 and previous config saved to /var/cache/conftool/dbconfig/20230830-132218-ladsgroup.json
  • 13:21 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
  • 13:21 jiji@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 13:20 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
  • 13:18 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:15 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on all projects (T336763) (duration: 09m 29s)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52089 and previous config saved to /var/cache/conftool/dbconfig/20230830-131157-ladsgroup.json
  • 13:10 samtar@deploy1002: samtar: Continuing with sync
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2031']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2030']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2029']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2028']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2026']
  • 13:08 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on all projects (T336763) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P52088 and previous config saved to /var/cache/conftool/dbconfig/20230830-130712-ladsgroup.json
  • 13:06 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on all projects (T336763)
  • 13:04 samtar@deploy1002: backport Cancelled
  • 13:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:02 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:02 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 13:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw2-22-ulsfo
  • 13:02 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 13:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 13:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 12:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:58 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 12:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52087 and previous config saved to /var/cache/conftool/dbconfig/20230830-125650-ladsgroup.json
  • 12:56 elukey: restart kubelet on ml-serve1001 to clear prometheus metrics
  • 12:55 taavi@deploy1002: Finished scap: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219) (duration: 11m 28s)
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 12:53 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:53 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:53 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52086 and previous config saved to /var/cache/conftool/dbconfig/20230830-125206-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52085 and previous config saved to /var/cache/conftool/dbconfig/20230830-124954-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52084 and previous config saved to /var/cache/conftool/dbconfig/20230830-124933-ladsgroup.json
  • 12:47 taavi@deploy1002: sukhe and taavi: Continuing with sync
  • 12:46 taavi@deploy1002: sukhe and taavi: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:46 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 12:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 12:43 taavi@deploy1002: Started scap: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219)
  • 12:43 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: fix thresholds (T343308) (duration: 25m 53s)
  • 12:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 12:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P52083 and previous config saved to /var/cache/conftool/dbconfig/20230830-123427-ladsgroup.json
  • 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52082 and previous config saved to /var/cache/conftool/dbconfig/20230830-123001-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:29 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52081 and previous config saved to /var/cache/conftool/dbconfig/20230830-122940-ladsgroup.json
  • 12:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:27 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 12:19 ladsgroup@deploy1002: isaranto and ladsgroup: Continuing with sync
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P52080 and previous config saved to /var/cache/conftool/dbconfig/20230830-121921-ladsgroup.json
  • 12:19 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: fix thresholds (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 12:17 ladsgroup@deploy1002: Started scap: Backport for ores-extension: fix thresholds (T343308)
  • 12:16 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52079 and previous config saved to /var/cache/conftool/dbconfig/20230830-121433-ladsgroup.json
  • 12:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 12:12 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:10 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52078 and previous config saved to /var/cache/conftool/dbconfig/20230830-120511-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52077 and previous config saved to /var/cache/conftool/dbconfig/20230830-120415-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52076 and previous config saved to /var/cache/conftool/dbconfig/20230830-115927-ladsgroup.json
  • 11:59 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 11:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 11:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:52 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 11:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52074 and previous config saved to /var/cache/conftool/dbconfig/20230830-115005-ladsgroup.json
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 11:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52073 and previous config saved to /var/cache/conftool/dbconfig/20230830-114421-ladsgroup.json
  • 11:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52072 and previous config saved to /var/cache/conftool/dbconfig/20230830-113728-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52071 and previous config saved to /var/cache/conftool/dbconfig/20230830-113656-ladsgroup.json
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52070 and previous config saved to /var/cache/conftool/dbconfig/20230830-113459-ladsgroup.json
  • 11:34 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P52069 and previous config saved to /var/cache/conftool/dbconfig/20230830-112150-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52068 and previous config saved to /var/cache/conftool/dbconfig/20230830-111952-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52067 and previous config saved to /var/cache/conftool/dbconfig/20230830-111720-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52066 and previous config saved to /var/cache/conftool/dbconfig/20230830-111659-ladsgroup.json
  • 11:16 jbond: switch cumin to the puppetdb api micro service Gerrit:953203
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 11:12 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:12 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52065 and previous config saved to /var/cache/conftool/dbconfig/20230830-111143-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 11:11 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52064 and previous config saved to /var/cache/conftool/dbconfig/20230830-111118-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52063 and previous config saved to /var/cache/conftool/dbconfig/20230830-110800-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P52062 and previous config saved to /var/cache/conftool/dbconfig/20230830-110644-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52061 and previous config saved to /var/cache/conftool/dbconfig/20230830-110152-ladsgroup.json
  • 11:01 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 11:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 10:57 XioNoX: enable mgmt_junos on fasw-c-codfw - T327862
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 10:56 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52060 and previous config saved to /var/cache/conftool/dbconfig/20230830-105612-ladsgroup.json
  • 10:55 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:54 moritzm: installing grub2 updates from bullseye point release
  • 10:53 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:52 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52059 and previous config saved to /var/cache/conftool/dbconfig/20230830-105254-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52058 and previous config saved to /var/cache/conftool/dbconfig/20230830-105138-ladsgroup.json
  • 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 10:48 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52057 and previous config saved to /var/cache/conftool/dbconfig/20230830-104646-ladsgroup.json
  • 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1006
  • 10:41 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1006
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52056 and previous config saved to /var/cache/conftool/dbconfig/20230830-104105-ladsgroup.json
  • 10:38 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52055 and previous config saved to /var/cache/conftool/dbconfig/20230830-103747-ladsgroup.json
  • 10:35 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 10:35 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 10:32 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52054 and previous config saved to /var/cache/conftool/dbconfig/20230830-103140-ladsgroup.json
  • 10:28 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52053 and previous config saved to /var/cache/conftool/dbconfig/20230830-102559-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52052 and previous config saved to /var/cache/conftool/dbconfig/20230830-102452-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52051 and previous config saved to /var/cache/conftool/dbconfig/20230830-102432-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52050 and previous config saved to /var/cache/conftool/dbconfig/20230830-102241-ladsgroup.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:16 godog: +50g to prometheus eqiad 'services' instance
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 10:14 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52049 and previous config saved to /var/cache/conftool/dbconfig/20230830-101437-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52048 and previous config saved to /var/cache/conftool/dbconfig/20230830-101410-ladsgroup.json
  • 10:13 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:12 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P52047 and previous config saved to /var/cache/conftool/dbconfig/20230830-100926-ladsgroup.json
  • 10:07 jiji@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 10:06 effie: Rolling reboot codfw wikikube k8s nodes
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52046 and previous config saved to /var/cache/conftool/dbconfig/20230830-100413-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52045 and previous config saved to /var/cache/conftool/dbconfig/20230830-100351-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52044 and previous config saved to /var/cache/conftool/dbconfig/20230830-095903-ladsgroup.json
  • 09:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P52043 and previous config saved to /var/cache/conftool/dbconfig/20230830-095419-ladsgroup.json
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52042 and previous config saved to /var/cache/conftool/dbconfig/20230830-094845-ladsgroup.json
  • 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52041 and previous config saved to /var/cache/conftool/dbconfig/20230830-094357-ladsgroup.json
  • 09:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52040 and previous config saved to /var/cache/conftool/dbconfig/20230830-093913-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52039 and previous config saved to /var/cache/conftool/dbconfig/20230830-093339-ladsgroup.json
  • 09:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:32 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 09:31 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 09:30 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:28 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52038 and previous config saved to /var/cache/conftool/dbconfig/20230830-092851-ladsgroup.json
  • 09:28 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:27 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 09:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 09:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52037 and previous config saved to /var/cache/conftool/dbconfig/20230830-092255-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52036 and previous config saved to /var/cache/conftool/dbconfig/20230830-092228-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 09:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52035 and previous config saved to /var/cache/conftool/dbconfig/20230830-092147-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52034 and previous config saved to /var/cache/conftool/dbconfig/20230830-091922-root.json
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52033 and previous config saved to /var/cache/conftool/dbconfig/20230830-091833-ladsgroup.json
  • 09:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52032 and previous config saved to /var/cache/conftool/dbconfig/20230830-091610-ladsgroup.json
  • 09:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52031 and previous config saved to /var/cache/conftool/dbconfig/20230830-091544-ladsgroup.json
  • 09:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw-c1a-eqiad
  • 09:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52030 and previous config saved to /var/cache/conftool/dbconfig/20230830-091242-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52029 and previous config saved to /var/cache/conftool/dbconfig/20230830-091203-ladsgroup.json
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:10 ladsgroup@deploy1002: Finished scap: Backport for Allow setting configurations through rtl dblist (duration: 08m 52s)
  • 09:09 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device fasw-c1a-eqiad
  • 09:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw-c8a-codfw
  • 09:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030', diff saved to https://phabricator.wikimedia.org/P52028 and previous config saved to /var/cache/conftool/dbconfig/20230830-090749-ladsgroup.json
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device fasw-c8a-codfw
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-d2-codfw
  • 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:04 ladsgroup@deploy1002: ladsgroup and zabe: Continuing with sync
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52027 and previous config saved to /var/cache/conftool/dbconfig/20230830-090415-root.json
  • 09:03 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-d2-codfw
  • 09:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-c2-codfw
  • 09:02 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Allow setting configurations through rtl dblist synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 09:01 ladsgroup@deploy1002: Started scap: Backport for Allow setting configurations through rtl dblist
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52026 and previous config saved to /var/cache/conftool/dbconfig/20230830-090038-ladsgroup.json
  • 09:00 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-c2-codfw
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-b2-codfw
  • 08:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ores2008.codfw.wmnet
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-b2-codfw
  • 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-0604-eqsin
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P52025 and previous config saved to /var/cache/conftool/dbconfig/20230830-085657-ladsgroup.json
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-0604-eqsin
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-d7-eqiad
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030', diff saved to https://phabricator.wikimedia.org/P52024 and previous config saved to /var/cache/conftool/dbconfig/20230830-085243-ladsgroup.json
  • 08:51 Emperor: stopping puppet to fix broken drive labelling after disk swap thanos-be1003 T345079
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-d7-eqiad
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-c2-eqiad
  • 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52023 and previous config saved to /var/cache/conftool/dbconfig/20230830-084911-root.json
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-a2-codfw
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52021 and previous config saved to /var/cache/conftool/dbconfig/20230830-084532-ladsgroup.json
  • 08:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-a2-codfw
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P52020 and previous config saved to /var/cache/conftool/dbconfig/20230830-084151-ladsgroup.json
  • 08:38 XioNoX: set bgp-error-tolerance on all sessions - T340111
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52019 and previous config saved to /var/cache/conftool/dbconfig/20230830-083737-ladsgroup.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52018 and previous config saved to /var/cache/conftool/dbconfig/20230830-083406-root.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52017 and previous config saved to /var/cache/conftool/dbconfig/20230830-083246-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52016 and previous config saved to /var/cache/conftool/dbconfig/20230830-083220-ladsgroup.json
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52015 and previous config saved to /var/cache/conftool/dbconfig/20230830-083025-ladsgroup.json
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52014 and previous config saved to /var/cache/conftool/dbconfig/20230830-082645-ladsgroup.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52013 and previous config saved to /var/cache/conftool/dbconfig/20230830-081901-root.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P52012 and previous config saved to /var/cache/conftool/dbconfig/20230830-081714-ladsgroup.json
  • 08:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52011 and previous config saved to /var/cache/conftool/dbconfig/20230830-080356-root.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P52010 and previous config saved to /var/cache/conftool/dbconfig/20230830-080208-ladsgroup.json
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52009 and previous config saved to /var/cache/conftool/dbconfig/20230830-075956-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P52008 and previous config saved to /var/cache/conftool/dbconfig/20230830-075934-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52007 and previous config saved to /var/cache/conftool/dbconfig/20230830-075736-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:57 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 07:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1128.eqiad.wmnet with OS bullseye
  • 07:50 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1129.eqiad.wmnet with OS bullseye
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 3%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52006 and previous config saved to /var/cache/conftool/dbconfig/20230830-074852-root.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52005 and previous config saved to /var/cache/conftool/dbconfig/20230830-074702-ladsgroup.json
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52004 and previous config saved to /var/cache/conftool/dbconfig/20230830-074428-ladsgroup.json
  • 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52003 and previous config saved to /var/cache/conftool/dbconfig/20230830-074238-root.json
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52002 and previous config saved to /var/cache/conftool/dbconfig/20230830-074202-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P52001 and previous config saved to /var/cache/conftool/dbconfig/20230830-073514-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 1%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52000 and previous config saved to /var/cache/conftool/dbconfig/20230830-073347-root.json
  • 07:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 upgrade to mariadb 10.4.31', diff saved to https://phabricator.wikimedia.org/P51999 and previous config saved to /var/cache/conftool/dbconfig/20230830-073144-root.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51998 and previous config saved to /var/cache/conftool/dbconfig/20230830-072922-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51997 and previous config saved to /var/cache/conftool/dbconfig/20230830-072902-ladsgroup.json
  • 07:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: host reimage
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51996 and previous config saved to /var/cache/conftool/dbconfig/20230830-072733-root.json
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 07:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 07:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 07:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 07:22 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: host reimage
  • 07:22 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51995 and previous config saved to /var/cache/conftool/dbconfig/20230830-072009-root.json
  • 07:19 ladsgroup@deploy1002: Finished scap: Backport for Disable search result deduplication. (T341227) (duration: 15m 53s)
  • 07:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:17 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P51994 and previous config saved to /var/cache/conftool/dbconfig/20230830-071416-ladsgroup.json
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51993 and previous config saved to /var/cache/conftool/dbconfig/20230830-071356-ladsgroup.json
  • 07:13 ladsgroup@deploy1002: ladsgroup and pfischer: Continuing with sync
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51992 and previous config saved to /var/cache/conftool/dbconfig/20230830-071228-root.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P51991 and previous config saved to /var/cache/conftool/dbconfig/20230830-071152-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 07:09 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1129.eqiad.wmnet with OS bullseye
  • 07:09 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1128.eqiad.wmnet with OS bullseye
  • 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51990 and previous config saved to /var/cache/conftool/dbconfig/20230830-070504-root.json
  • 07:04 ladsgroup@deploy1002: ladsgroup and pfischer: Backport for Disable search result deduplication. (T341227) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 07:03 ladsgroup@deploy1002: Started scap: Backport for Disable search result deduplication. (T341227)
  • 07:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 07:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1127.eqiad.wmnet with OS bullseye
  • 06:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1126.eqiad.wmnet with OS bullseye
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51989 and previous config saved to /var/cache/conftool/dbconfig/20230830-065849-ladsgroup.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51988 and previous config saved to /var/cache/conftool/dbconfig/20230830-065723-root.json
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 06:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51987 and previous config saved to /var/cache/conftool/dbconfig/20230830-064959-root.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51986 and previous config saved to /var/cache/conftool/dbconfig/20230830-064343-ladsgroup.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51985 and previous config saved to /var/cache/conftool/dbconfig/20230830-064219-root.json
  • 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51984 and previous config saved to /var/cache/conftool/dbconfig/20230830-064131-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: host reimage
  • 06:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: host reimage
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51983 and previous config saved to /var/cache/conftool/dbconfig/20230830-063455-root.json
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 06:33 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: host reimage
  • 06:32 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: host reimage
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51982 and previous config saved to /var/cache/conftool/dbconfig/20230830-062714-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51981 and previous config saved to /var/cache/conftool/dbconfig/20230830-061950-root.json
  • 06:19 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1127.eqiad.wmnet with OS bullseye
  • 06:18 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1126.eqiad.wmnet with OS bullseye
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51980 and previous config saved to /var/cache/conftool/dbconfig/20230830-061209-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 3%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51979 and previous config saved to /var/cache/conftool/dbconfig/20230830-060445-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51978 and previous config saved to /var/cache/conftool/dbconfig/20230830-055704-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P51977 and previous config saved to /var/cache/conftool/dbconfig/20230830-055034-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 1%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51976 and previous config saved to /var/cache/conftool/dbconfig/20230830-054940-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P51975 and previous config saved to /var/cache/conftool/dbconfig/20230830-054248-root.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51974 and previous config saved to /var/cache/conftool/dbconfig/20230830-051543-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51973 and previous config saved to /var/cache/conftool/dbconfig/20230830-050036-ladsgroup.json
  • 04:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51972 and previous config saved to /var/cache/conftool/dbconfig/20230830-044530-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51971 and previous config saved to /var/cache/conftool/dbconfig/20230830-043024-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51970 and previous config saved to /var/cache/conftool/dbconfig/20230830-014730-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51969 and previous config saved to /var/cache/conftool/dbconfig/20230830-014702-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51968 and previous config saved to /var/cache/conftool/dbconfig/20230830-013156-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51967 and previous config saved to /var/cache/conftool/dbconfig/20230830-011650-ladsgroup.json
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51966 and previous config saved to /var/cache/conftool/dbconfig/20230830-010144-ladsgroup.json
  • 01:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2039.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2039.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51965 and previous config saved to /var/cache/conftool/dbconfig/20230830-002108-ladsgroup.json
  • 00:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51964 and previous config saved to /var/cache/conftool/dbconfig/20230830-000602-ladsgroup.json
  • 00:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2028.mgmt.codfw.wmnet with reboot policy FORCED

2023-08-29

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51963 and previous config saved to /var/cache/conftool/dbconfig/20230829-235055-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51962 and previous config saved to /var/cache/conftool/dbconfig/20230829-233549-ladsgroup.json
  • 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2034 hosts in codfw - jhancock@cumin2002"
  • 23:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2034 hosts in codfw - jhancock@cumin2002"
  • 23:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:01 eileen: civicrm upgraded from fc5c73db to 29ce9ac0
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51961 and previous config saved to /var/cache/conftool/dbconfig/20230829-225031-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51960 and previous config saved to /var/cache/conftool/dbconfig/20230829-225010-ladsgroup.json
  • 22:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51959 and previous config saved to /var/cache/conftool/dbconfig/20230829-223504-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51958 and previous config saved to /var/cache/conftool/dbconfig/20230829-221958-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51957 and previous config saved to /var/cache/conftool/dbconfig/20230829-220451-ladsgroup.json
  • 21:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:36 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase1030.eqiad.wmnet
  • 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51956 and previous config saved to /var/cache/conftool/dbconfig/20230829-212619-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51955 and previous config saved to /var/cache/conftool/dbconfig/20230829-212558-ladsgroup.json
  • 21:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2026 hosts in codfw - jhancock@cumin2002"
  • 21:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2026 hosts in codfw - jhancock@cumin2002"
  • 21:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51954 and previous config saved to /var/cache/conftool/dbconfig/20230829-211052-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51953 and previous config saved to /var/cache/conftool/dbconfig/20230829-205546-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51952 and previous config saved to /var/cache/conftool/dbconfig/20230829-204039-ladsgroup.json
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797) (duration: 07m 13s)
  • 20:10 urbanecm@deploy1002: urbanecm and dreamyjazz: Continuing with sync
  • 20:10 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:09 urbanecm@deploy1002: Started scap: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797)
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51951 and previous config saved to /var/cache/conftool/dbconfig/20230829-195215-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51950 and previous config saved to /var/cache/conftool/dbconfig/20230829-195154-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P51949 and previous config saved to /var/cache/conftool/dbconfig/20230829-193648-ladsgroup.json
  • 19:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
  • 19:32 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw2-c2-eqiad
  • 19:32 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 19:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad
  • 19:30 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
  • 19:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
  • 19:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
  • 19:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f3-eqiad
  • 19:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
  • 19:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f3-eqiad
  • 19:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f2-eqiad
  • 19:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 19:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f2-eqiad
  • 19:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-eqiad
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P51948 and previous config saved to /var/cache/conftool/dbconfig/20230829-192141-ladsgroup.json
  • 19:20 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f1-eqiad
  • 19:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e3-eqiad
  • 19:18 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e3-eqiad
  • 19:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e2-eqiad
  • 19:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 19:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:16 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e2-eqiad
  • 19:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e1-eqiad
  • 19:13 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e1-eqiad
  • 19:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
  • 19:11 eileen: civicrm upgraded from d13e6e0c to fc5c73db
  • 19:10 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
  • 19:10 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 19:09 eileen: civicrm upgraded from d13e6e0c to fc5c73db
  • 19:07 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 08s)
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51947 and previous config saved to /var/cache/conftool/dbconfig/20230829-190635-ladsgroup.json
  • 19:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 19:00 zabe@deploy1002: Started scap: update interwiki cache
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 18:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 18:37 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1002.eqiad.wmnet
  • 18:37 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host vrts1002.eqiad.wmnet with OS bullseye
  • 18:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
  • 18:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
  • 18:24 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1002.eqiad.wmnet with reason: host reimage
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51946 and previous config saved to /var/cache/conftool/dbconfig/20230829-182251-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51945 and previous config saved to /var/cache/conftool/dbconfig/20230829-182225-ladsgroup.json
  • 18:21 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1002.eqiad.wmnet with reason: host reimage
  • 18:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.24 refs T343726
  • 18:12 aokoth@cumin1001: START - Cookbook sre.hosts.reimage for host vrts1002.eqiad.wmnet with OS bullseye
  • 18:10 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:09 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 18:08 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:08 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51944 and previous config saved to /var/cache/conftool/dbconfig/20230829-180719-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51943 and previous config saved to /var/cache/conftool/dbconfig/20230829-180613-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:04 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 18:04 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1002.eqiad.wmnet
  • 18:00 urbanecm@deploy1002: Finished scap: Backport for Growth: Disable Add an image on all wikis (T345188) (duration: 06m 47s)
  • 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 17:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 17:53 urbanecm@deploy1002: Started scap: Backport for Growth: Disable Add an image on all wikis (T345188)
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51942 and previous config saved to /var/cache/conftool/dbconfig/20230829-175213-ladsgroup.json
  • 17:51 jhuneidi@deploy1002: Pruned MediaWiki: 1.41.0-wmf.22 (duration: 02m 11s)
  • 17:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 17:48 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.24 refs T343726 (duration: 43m 27s)
  • 17:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51941 and previous config saved to /var/cache/conftool/dbconfig/20230829-173707-ladsgroup.json
  • 17:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
  • 17:05 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.24 refs T343726
  • 16:31 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:26 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:25 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:24 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:23 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:23 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
  • 16:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 16:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 16:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 16:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 16:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:13 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308) (duration: 09m 31s)
  • 16:12 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 16:11 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:10 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:09 akosiaris: deploy cxserver mariadb egress functionality. T341117
  • 16:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
  • 16:09 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:09 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:08 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 16:07 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:05 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 16:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@deploy1002: Started scap: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308)
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51939 and previous config saved to /var/cache/conftool/dbconfig/20230829-160415-ladsgroup.json
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51938 and previous config saved to /var/cache/conftool/dbconfig/20230829-160144-ladsgroup.json
  • 15:58 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
  • 15:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51937 and previous config saved to /var/cache/conftool/dbconfig/20230829-154909-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51936 and previous config saved to /var/cache/conftool/dbconfig/20230829-154638-ladsgroup.json
  • 15:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:38 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51935 and previous config saved to /var/cache/conftool/dbconfig/20230829-153801-ladsgroup.json
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudservices1006']
  • 15:37 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 15:35 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51934 and previous config saved to /var/cache/conftool/dbconfig/20230829-153403-ladsgroup.json
  • 15:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006']
  • 15:31 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1125.eqiad.wmnet with OS bullseye
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51933 and previous config saved to /var/cache/conftool/dbconfig/20230829-153132-ladsgroup.json
  • 15:30 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host vrts1002.eqiad.wmnet
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 15:30 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:30 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 15:29 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:28 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw2-c2-eqiad
  • 15:28 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1124.eqiad.wmnet with OS bullseye
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 15:27 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 15:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-b2-eqiad
  • 15:27 ladsgroup@deploy1002: Finished scap: Creating tlywiki (T345166) (duration: 07m 03s)
  • 15:25 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 15:25 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 15:25 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:24 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:24 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:24 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-b2-eqiad
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-a7-eqiad
  • 15:23 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:23 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51932 and previous config saved to /var/cache/conftool/dbconfig/20230829-152255-ladsgroup.json
  • 15:22 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:22 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:22 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 15:22 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1002.eqiad.wmnet
  • 15:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 15:21 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:21 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:21 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:21 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-a7-eqiad
  • 15:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-22-ulsfo
  • 15:20 ladsgroup@deploy1002: Started scap: Creating tlywiki (T345166)
  • 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51931 and previous config saved to /var/cache/conftool/dbconfig/20230829-151857-ladsgroup.json
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 15:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-by27-esams
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51930 and previous config saved to /var/cache/conftool/dbconfig/20230829-151625-ladsgroup.json
  • 15:16 ladsgroup@deploy1002: Finished scap: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921) (duration: 11m 46s)
  • 15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: sync
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51929 and previous config saved to /var/cache/conftool/dbconfig/20230829-151423-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51928 and previous config saved to /var/cache/conftool/dbconfig/20230829-151402-ladsgroup.json
  • 15:13 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-by27-esams
  • 15:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-bw27-esams
  • 15:10 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:09 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-bw27-esams
  • 15:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b13-drmrs
  • 15:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: host reimage
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51927 and previous config saved to /var/cache/conftool/dbconfig/20230829-150749-ladsgroup.json
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 15:06 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: host reimage
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 15:06 ladsgroup@deploy1002: ladsgroup: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
  • 15:04 ladsgroup@deploy1002: Started scap: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921)
  • 15:04 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: host reimage
  • 15:04 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b13-drmrs
  • 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b12-drmrs
  • 15:03 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: host reimage
  • 15:02 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 14:59 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b12-drmrs
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P51926 and previous config saved to /var/cache/conftool/dbconfig/20230829-145856-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51925 and previous config saved to /var/cache/conftool/dbconfig/20230829-145727-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
  • 14:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51924 and previous config saved to /var/cache/conftool/dbconfig/20230829-145705-ladsgroup.json
  • 14:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
  • 14:56 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 14:56 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
  • 14:56 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:55 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:55 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:54 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
  • 14:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-f4-eqiad
  • 14:54 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:54 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51923 and previous config saved to /var/cache/conftool/dbconfig/20230829-145242-ladsgroup.json
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-f4-eqiad
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-e4-eqiad
  • 14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir5002.eqsin.wmnet} and A:ncredir
  • 14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir6002.drmrs.wmnet} and A:ncredir
  • 14:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1125.eqiad.wmnet with OS bullseye
  • 14:50 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1124.eqiad.wmnet with OS bullseye
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-e4-eqiad
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-d5-eqiad
  • 14:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
  • 14:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 14:47 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir6002.drmrs.wmnet} and A:ncredir
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-d5-eqiad
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-c8-eqiad
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir4002.ulsfo.wmnet} and A:ncredir
  • 14:46 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5002.eqsin.wmnet} and A:ncredir
  • 14:46 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir2002.codfw.wmnet} and A:ncredir
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir1002.eqiad.wmnet} and A:ncredir
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-c8-eqiad
  • 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P51922 and previous config saved to /var/cache/conftool/dbconfig/20230829-144349-ladsgroup.json
  • 14:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir4002.ulsfo.wmnet} and A:ncredir
  • 14:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir2002.codfw.wmnet} and A:ncredir
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P51921 and previous config saved to /var/cache/conftool/dbconfig/20230829-144159-ladsgroup.json
  • 14:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir1002.eqiad.wmnet} and A:ncredir
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir2001.codfw.wmnet} and A:ncredir
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir6001.drmrs.wmnet} and A:ncredir
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 14:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
  • 14:38 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 14:37 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir2001.codfw.wmnet} and A:ncredir
  • 14:37 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir6001.drmrs.wmnet} and A:ncredir
  • 14:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 14:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir5001.eqsin.wmnet} and A:ncredir
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51920 and previous config saved to /var/cache/conftool/dbconfig/20230829-143434-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51919 and previous config saved to /var/cache/conftool/dbconfig/20230829-143413-ladsgroup.json
  • 14:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-22-ulsfo
  • 14:31 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir4001.ulsfo.wmnet} and A:ncredir
  • 14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 14:30 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr4-ulsfo
  • 14:28 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5001.eqsin.wmnet} and A:ncredir
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51918 and previous config saved to /var/cache/conftool/dbconfig/20230829-142843-ladsgroup.json
  • 14:28 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=97) rolling reboot on P{ncredir5001.*} and A:ncredir
  • 14:28 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5001.*} and A:ncredir
  • 14:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
  • 14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir4001.ulsfo.wmnet} and A:ncredir
  • 14:27 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir1001.eqiad.wmnet} and A:ncredir
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P51917 and previous config saved to /var/cache/conftool/dbconfig/20230829-142653-ladsgroup.json
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 14:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr4-ulsfo
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 14:22 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir1001.eqiad.wmnet} and A:ncredir
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51916 and previous config saved to /var/cache/conftool/dbconfig/20230829-141907-ladsgroup.json
  • 14:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51915 and previous config saved to /var/cache/conftool/dbconfig/20230829-141147-ladsgroup.json
  • 14:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 14:08 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 14:08 fabfur: start rebooting ncredir hosts for T344587
  • 14:07 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:07 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 14:06 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1017.eqiad.wmnet
  • 14:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:06 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 14:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 14:05 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 14:05 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1017.eqiad.wmnet
  • 14:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:05 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 14:05 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51914 and previous config saved to /var/cache/conftool/dbconfig/20230829-140400-ladsgroup.json
  • 13:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:56 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
  • 13:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
  • 13:53 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
  • 13:53 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51913 and previous config saved to /var/cache/conftool/dbconfig/20230829-135236-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51912 and previous config saved to /var/cache/conftool/dbconfig/20230829-135214-ladsgroup.json
  • 13:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2025']
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51911 and previous config saved to /var/cache/conftool/dbconfig/20230829-134854-ladsgroup.json
  • 13:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 13:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 13:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:38 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51910 and previous config saved to /var/cache/conftool/dbconfig/20230829-133708-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51909 and previous config saved to /var/cache/conftool/dbconfig/20230829-133018-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51908 and previous config saved to /var/cache/conftool/dbconfig/20230829-132957-ladsgroup.json
  • 13:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:28 moritzm: installing openssl security updates on buster
  • 13:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51907 and previous config saved to /var/cache/conftool/dbconfig/20230829-132202-ladsgroup.json
  • 13:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 13:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51906 and previous config saved to /var/cache/conftool/dbconfig/20230829-131451-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51905 and previous config saved to /var/cache/conftool/dbconfig/20230829-130656-ladsgroup.json
  • 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51904 and previous config saved to /var/cache/conftool/dbconfig/20230829-125944-ladsgroup.json
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51903 and previous config saved to /var/cache/conftool/dbconfig/20230829-125844-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51902 and previous config saved to /var/cache/conftool/dbconfig/20230829-125823-ladsgroup.json
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51901 and previous config saved to /var/cache/conftool/dbconfig/20230829-124750-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51900 and previous config saved to /var/cache/conftool/dbconfig/20230829-124739-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51899 and previous config saved to /var/cache/conftool/dbconfig/20230829-124438-ladsgroup.json
  • 12:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51898 and previous config saved to /var/cache/conftool/dbconfig/20230829-124317-ladsgroup.json
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51897 and previous config saved to /var/cache/conftool/dbconfig/20230829-123233-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51896 and previous config saved to /var/cache/conftool/dbconfig/20230829-122811-ladsgroup.json
  • 12:27 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:24 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51894 and previous config saved to /var/cache/conftool/dbconfig/20230829-122403-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51893 and previous config saved to /var/cache/conftool/dbconfig/20230829-122342-ladsgroup.json
  • 12:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51892 and previous config saved to /var/cache/conftool/dbconfig/20230829-121727-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51891 and previous config saved to /var/cache/conftool/dbconfig/20230829-121305-ladsgroup.json
  • 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51890 and previous config saved to /var/cache/conftool/dbconfig/20230829-120835-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51889 and previous config saved to /var/cache/conftool/dbconfig/20230829-120603-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51888 and previous config saved to /var/cache/conftool/dbconfig/20230829-120221-ladsgroup.json
  • 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51887 and previous config saved to /var/cache/conftool/dbconfig/20230829-115329-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51886 and previous config saved to /var/cache/conftool/dbconfig/20230829-114326-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51885 and previous config saved to /var/cache/conftool/dbconfig/20230829-114304-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51884 and previous config saved to /var/cache/conftool/dbconfig/20230829-113823-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51883 and previous config saved to /var/cache/conftool/dbconfig/20230829-112758-ladsgroup.json
  • 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51882 and previous config saved to /var/cache/conftool/dbconfig/20230829-111949-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:19 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51881 and previous config saved to /var/cache/conftool/dbconfig/20230829-111927-ladsgroup.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51880 and previous config saved to /var/cache/conftool/dbconfig/20230829-111252-ladsgroup.json
  • 11:08 moritzm: installing nftables bugfix updates from Bullseye point release
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51879 and previous config saved to /var/cache/conftool/dbconfig/20230829-110421-ladsgroup.json
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 10:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51878 and previous config saved to /var/cache/conftool/dbconfig/20230829-105746-ladsgroup.json
  • 10:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-ulsfo
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 10:51 joal@deploy1002: Finished deploy [airflow-dags/analytics@90f280e]: Regular deploy of Analytics airflow dags [airflow-dags/analytics@90f280ec] (duration: 00m 14s)
  • 10:51 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-ulsfo
  • 10:51 joal@deploy1002: Started deploy [airflow-dags/analytics@90f280e]: Regular deploy of Analytics airflow dags [airflow-dags/analytics@90f280ec]
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51877 and previous config saved to /var/cache/conftool/dbconfig/20230829-104915-ladsgroup.json
  • 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 10:42 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 10:42 cgoubert@deploy1002: Finished scap: Removing mw-on-k8s tls-proxy CPU limits - T344814 (duration: 02m 27s)
  • 10:39 cgoubert@deploy1002: Started scap: Removing mw-on-k8s tls-proxy CPU limits - T344814
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51876 and previous config saved to /var/cache/conftool/dbconfig/20230829-103901-ladsgroup.json
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51875 and previous config saved to /var/cache/conftool/dbconfig/20230829-103840-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51874 and previous config saved to /var/cache/conftool/dbconfig/20230829-103409-ladsgroup.json
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 10:30 claime: Running puppet on deploy servers to bump envoy image version - T344814
  • 10:27 jynus: reboot db1204
  • 10:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1122.eqiad.wmnet with OS bullseye
  • 10:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1123.eqiad.wmnet with OS bullseye
  • 10:24 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51873 and previous config saved to /var/cache/conftool/dbconfig/20230829-102333-ladsgroup.json
  • 10:22 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:22 jayme: Successfully published image docker-registry.discovery.wmnet/envoy:1.23.10-2-s2
  • 10:21 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:19 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 10:17 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:16 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51872 and previous config saved to /var/cache/conftool/dbconfig/20230829-101536-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51871 and previous config saved to /var/cache/conftool/dbconfig/20230829-101515-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51870 and previous config saved to /var/cache/conftool/dbconfig/20230829-100827-ladsgroup.json
  • 10:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: host reimage
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51869 and previous config saved to /var/cache/conftool/dbconfig/20230829-100315-ladsgroup.json
  • 10:02 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: host reimage
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51868 and previous config saved to /var/cache/conftool/dbconfig/20230829-100009-ladsgroup.json
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: host reimage
  • 09:57 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: host reimage
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51867 and previous config saved to /var/cache/conftool/dbconfig/20230829-095638-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51866 and previous config saved to /var/cache/conftool/dbconfig/20230829-095617-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51865 and previous config saved to /var/cache/conftool/dbconfig/20230829-095321-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51864 and previous config saved to /var/cache/conftool/dbconfig/20230829-094809-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51862 and previous config saved to /var/cache/conftool/dbconfig/20230829-094503-ladsgroup.json
  • 09:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1123.eqiad.wmnet with OS bullseye
  • 09:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1122.eqiad.wmnet with OS bullseye
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51861 and previous config saved to /var/cache/conftool/dbconfig/20230829-094111-ladsgroup.json
  • 09:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr1-esams:xe-0/0/7
  • 09:39 cgoubert@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-esams:xe-0/0/7
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51860 and previous config saved to /var/cache/conftool/dbconfig/20230829-093539-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51859 and previous config saved to /var/cache/conftool/dbconfig/20230829-093513-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51858 and previous config saved to /var/cache/conftool/dbconfig/20230829-093303-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51857 and previous config saved to /var/cache/conftool/dbconfig/20230829-092957-ladsgroup.json
  • 09:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 09:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51856 and previous config saved to /var/cache/conftool/dbconfig/20230829-092605-ladsgroup.json
  • 09:22 moritzm: failover the ganeti master in codfw to ganeti2022
  • 09:22 jynus: restart db1205
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51855 and previous config saved to /var/cache/conftool/dbconfig/20230829-092007-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51854 and previous config saved to /var/cache/conftool/dbconfig/20230829-091756-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51853 and previous config saved to /var/cache/conftool/dbconfig/20230829-091223-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51852 and previous config saved to /var/cache/conftool/dbconfig/20230829-091202-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51851 and previous config saved to /var/cache/conftool/dbconfig/20230829-091059-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51850 and previous config saved to /var/cache/conftool/dbconfig/20230829-090501-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P51849 and previous config saved to /var/cache/conftool/dbconfig/20230829-085656-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51848 and previous config saved to /var/cache/conftool/dbconfig/20230829-084955-ladsgroup.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P51847 and previous config saved to /var/cache/conftool/dbconfig/20230829-084150-ladsgroup.json
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51846 and previous config saved to /var/cache/conftool/dbconfig/20230829-083223-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51845 and previous config saved to /var/cache/conftool/dbconfig/20230829-083202-ladsgroup.json
  • 08:30 claime: Restarted grafana-ldap-users-sync.service on grafana1002
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51844 and previous config saved to /var/cache/conftool/dbconfig/20230829-082644-ladsgroup.json
  • 08:26 claime: downtiming cassandra-a alerts on restbase1030.eqiad.wmnet for 14 days T344210 T344259
  • 08:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51843 and previous config saved to /var/cache/conftool/dbconfig/20230829-081655-ladsgroup.json
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 08:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51842 and previous config saved to /var/cache/conftool/dbconfig/20230829-080828-ladsgroup.json
  • 08:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51841 and previous config saved to /var/cache/conftool/dbconfig/20230829-080807-ladsgroup.json
  • 08:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 08:03 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51840 and previous config saved to /var/cache/conftool/dbconfig/20230829-080149-ladsgroup.json
  • 07:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1121.eqiad.wmnet with OS bullseye
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51839 and previous config saved to /var/cache/conftool/dbconfig/20230829-075301-ladsgroup.json
  • 07:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1120.eqiad.wmnet with OS bullseye
  • 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51838 and previous config saved to /var/cache/conftool/dbconfig/20230829-074643-ladsgroup.json
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 07:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51837 and previous config saved to /var/cache/conftool/dbconfig/20230829-073755-ladsgroup.json
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 07:31 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: host reimage
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51836 and previous config saved to /var/cache/conftool/dbconfig/20230829-072853-ladsgroup.json
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51835 and previous config saved to /var/cache/conftool/dbconfig/20230829-072832-ladsgroup.json
  • 07:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: host reimage
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 07:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 07:26 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: host reimage
  • 07:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: host reimage
  • 07:24 kartik@deploy1002: Finished scap: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669) (duration: 21m 02s)
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51834 and previous config saved to /var/cache/conftool/dbconfig/20230829-072249-ladsgroup.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 07:14 kartik@deploy1002: kartik: Continuing with sync
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51833 and previous config saved to /var/cache/conftool/dbconfig/20230829-071326-ladsgroup.json
  • 07:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1121.eqiad.wmnet with OS bullseye
  • 07:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1120.eqiad.wmnet with OS bullseye
  • 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 07:07 kartik@deploy1002: kartik: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51832 and previous config saved to /var/cache/conftool/dbconfig/20230829-070443-ladsgroup.json
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 07:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51831 and previous config saved to /var/cache/conftool/dbconfig/20230829-070422-ladsgroup.json
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669)
  • 06:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1119.eqiad.wmnet with OS bullseye
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51830 and previous config saved to /var/cache/conftool/dbconfig/20230829-065819-ladsgroup.json
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 06:57 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1118.eqiad.wmnet with OS bullseye
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51829 and previous config saved to /var/cache/conftool/dbconfig/20230829-065525-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51828 and previous config saved to /var/cache/conftool/dbconfig/20230829-065515-ladsgroup.json
  • 06:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51827 and previous config saved to /var/cache/conftool/dbconfig/20230829-065059-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51826 and previous config saved to /var/cache/conftool/dbconfig/20230829-065038-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51825 and previous config saved to /var/cache/conftool/dbconfig/20230829-064916-ladsgroup.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51824 and previous config saved to /var/cache/conftool/dbconfig/20230829-064313-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51823 and previous config saved to /var/cache/conftool/dbconfig/20230829-064009-ladsgroup.json
  • 06:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: host reimage
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P51822 and previous config saved to /var/cache/conftool/dbconfig/20230829-063531-ladsgroup.json
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51821 and previous config saved to /var/cache/conftool/dbconfig/20230829-063410-ladsgroup.json
  • 06:33 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: host reimage
  • 06:31 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: host reimage
  • 06:30 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: host reimage
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51820 and previous config saved to /var/cache/conftool/dbconfig/20230829-062532-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51819 and previous config saved to /var/cache/conftool/dbconfig/20230829-062511-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51818 and previous config saved to /var/cache/conftool/dbconfig/20230829-062502-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P51817 and previous config saved to /var/cache/conftool/dbconfig/20230829-062025-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51816 and previous config saved to /var/cache/conftool/dbconfig/20230829-061904-ladsgroup.json
  • 06:17 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1119.eqiad.wmnet with OS bullseye
  • 06:17 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1118.eqiad.wmnet with OS bullseye
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P51815 and previous config saved to /var/cache/conftool/dbconfig/20230829-061005-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51814 and previous config saved to /var/cache/conftool/dbconfig/20230829-060956-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51813 and previous config saved to /var/cache/conftool/dbconfig/20230829-060519-ladsgroup.json
  • 06:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51812 and previous config saved to /var/cache/conftool/dbconfig/20230829-060047-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P51811 and previous config saved to /var/cache/conftool/dbconfig/20230829-055459-ladsgroup.json
  • 05:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51810 and previous config saved to /var/cache/conftool/dbconfig/20230829-054405-ladsgroup.json
  • 05:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51809 and previous config saved to /var/cache/conftool/dbconfig/20230829-053953-ladsgroup.json
  • 05:39 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P51808 and previous config saved to /var/cache/conftool/dbconfig/20230829-052859-ladsgroup.json
  • 05:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1117.eqiad.wmnet with OS bullseye
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51807 and previous config saved to /var/cache/conftool/dbconfig/20230829-052222-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P51806 and previous config saved to /var/cache/conftool/dbconfig/20230829-051353-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51805 and previous config saved to /var/cache/conftool/dbconfig/20230829-045847-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51804 and previous config saved to /var/cache/conftool/dbconfig/20230829-044049-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 03:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51803 and previous config saved to /var/cache/conftool/dbconfig/20230829-034540-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51802 and previous config saved to /var/cache/conftool/dbconfig/20230829-034530-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51801 and previous config saved to /var/cache/conftool/dbconfig/20230829-034509-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51800 and previous config saved to /var/cache/conftool/dbconfig/20230829-033002-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51799 and previous config saved to /var/cache/conftool/dbconfig/20230829-031456-ladsgroup.json
  • 03:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bullseye
  • 03:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 03:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51798 and previous config saved to /var/cache/conftool/dbconfig/20230829-025950-ladsgroup.json
  • 02:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 02:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51797 and previous config saved to /var/cache/conftool/dbconfig/20230829-014452-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P51796 and previous config saved to /var/cache/conftool/dbconfig/20230829-012946-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P51795 and previous config saved to /var/cache/conftool/dbconfig/20230829-011440-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51794 and previous config saved to /var/cache/conftool/dbconfig/20230829-005933-ladsgroup.json
  • 00:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51793 and previous config saved to /var/cache/conftool/dbconfig/20230829-004925-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51792 and previous config saved to /var/cache/conftool/dbconfig/20230829-004217-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51791 and previous config saved to /var/cache/conftool/dbconfig/20230829-004207-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51790 and previous config saved to /var/cache/conftool/dbconfig/20230829-003418-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51789 and previous config saved to /var/cache/conftool/dbconfig/20230829-002700-ladsgroup.json
  • 00:22 eileen: civicrm upgraded from 6a2cdf10 to d13e6e0c
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51788 and previous config saved to /var/cache/conftool/dbconfig/20230829-001912-ladsgroup.json
  • 00:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51787 and previous config saved to /var/cache/conftool/dbconfig/20230829-001154-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51786 and previous config saved to /var/cache/conftool/dbconfig/20230829-000406-ladsgroup.json

2023-08-28

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51785 and previous config saved to /var/cache/conftool/dbconfig/20230828-235648-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51784 and previous config saved to /var/cache/conftool/dbconfig/20230828-232344-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 23:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 23:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:11 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 23:11 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51783 and previous config saved to /var/cache/conftool/dbconfig/20230828-225306-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51782 and previous config saved to /var/cache/conftool/dbconfig/20230828-223800-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51781 and previous config saved to /var/cache/conftool/dbconfig/20230828-223740-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51780 and previous config saved to /var/cache/conftool/dbconfig/20230828-223719-ladsgroup.json
  • 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51779 and previous config saved to /var/cache/conftool/dbconfig/20230828-222254-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51778 and previous config saved to /var/cache/conftool/dbconfig/20230828-222212-ladsgroup.json
  • 22:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:14 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51777 and previous config saved to /var/cache/conftool/dbconfig/20230828-220747-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51776 and previous config saved to /var/cache/conftool/dbconfig/20230828-220706-ladsgroup.json
  • 21:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: T337296 restart services for new federation endpoint (duration: 01m 12s)
  • 21:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: T337296 restart services for new federation endpoint
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51775 and previous config saved to /var/cache/conftool/dbconfig/20230828-215200-ladsgroup.json
  • 21:45 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51774 and previous config saved to /var/cache/conftool/dbconfig/20230828-214409-ladsgroup.json
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51773 and previous config saved to /var/cache/conftool/dbconfig/20230828-212903-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51772 and previous config saved to /var/cache/conftool/dbconfig/20230828-212733-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51771 and previous config saved to /var/cache/conftool/dbconfig/20230828-212712-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51770 and previous config saved to /var/cache/conftool/dbconfig/20230828-212647-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51769 and previous config saved to /var/cache/conftool/dbconfig/20230828-212637-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51768 and previous config saved to /var/cache/conftool/dbconfig/20230828-211357-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51767 and previous config saved to /var/cache/conftool/dbconfig/20230828-211206-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51766 and previous config saved to /var/cache/conftool/dbconfig/20230828-211131-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51765 and previous config saved to /var/cache/conftool/dbconfig/20230828-205851-ladsgroup.json
  • 20:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51764 and previous config saved to /var/cache/conftool/dbconfig/20230828-205700-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51763 and previous config saved to /var/cache/conftool/dbconfig/20230828-205625-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51762 and previous config saved to /var/cache/conftool/dbconfig/20230828-204153-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51761 and previous config saved to /var/cache/conftool/dbconfig/20230828-204119-ladsgroup.json
  • 20:27 urandom: clear pre-upgrade aqs snapshots — T339299
  • 20:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51760 and previous config saved to /var/cache/conftool/dbconfig/20230828-202206-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51759 and previous config saved to /var/cache/conftool/dbconfig/20230828-202145-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51758 and previous config saved to /var/cache/conftool/dbconfig/20230828-200639-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51757 and previous config saved to /var/cache/conftool/dbconfig/20230828-200415-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 20:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51756 and previous config saved to /var/cache/conftool/dbconfig/20230828-200354-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51755 and previous config saved to /var/cache/conftool/dbconfig/20230828-195132-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51754 and previous config saved to /var/cache/conftool/dbconfig/20230828-194848-ladsgroup.json
  • 19:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51753 and previous config saved to /var/cache/conftool/dbconfig/20230828-193626-ladsgroup.json
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1004.eqiad.wmnet with OS bullseye
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51752 and previous config saved to /var/cache/conftool/dbconfig/20230828-193511-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51751 and previous config saved to /var/cache/conftool/dbconfig/20230828-193501-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51750 and previous config saved to /var/cache/conftool/dbconfig/20230828-193342-ladsgroup.json
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51749 and previous config saved to /var/cache/conftool/dbconfig/20230828-191955-ladsgroup.json
  • 19:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51748 and previous config saved to /var/cache/conftool/dbconfig/20230828-191836-ladsgroup.json
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: host reimage
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: host reimage
  • 19:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51747 and previous config saved to /var/cache/conftool/dbconfig/20230828-190449-ladsgroup.json
  • 19:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1004.eqiad.wmnet with OS bullseye
  • 18:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1004.eqiad.wmnet
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51746 and previous config saved to /var/cache/conftool/dbconfig/20230828-185924-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51745 and previous config saved to /var/cache/conftool/dbconfig/20230828-185903-ladsgroup.json
  • 18:57 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:57 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:55 inflatador: bking@cumin1001 depool wdqs1004 for firmware update
  • 18:53 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 18:52 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1004.eqiad.wmnet
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51744 and previous config saved to /var/cache/conftool/dbconfig/20230828-184943-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51743 and previous config saved to /var/cache/conftool/dbconfig/20230828-184357-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51742 and previous config saved to /var/cache/conftool/dbconfig/20230828-184149-ladsgroup.json
  • 18:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51741 and previous config saved to /var/cache/conftool/dbconfig/20230828-182851-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P51740 and previous config saved to /var/cache/conftool/dbconfig/20230828-182642-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51739 and previous config saved to /var/cache/conftool/dbconfig/20230828-182104-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51738 and previous config saved to /var/cache/conftool/dbconfig/20230828-182025-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51737 and previous config saved to /var/cache/conftool/dbconfig/20230828-181427-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51736 and previous config saved to /var/cache/conftool/dbconfig/20230828-181417-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51735 and previous config saved to /var/cache/conftool/dbconfig/20230828-181344-ladsgroup.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P51734 and previous config saved to /var/cache/conftool/dbconfig/20230828-181136-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51733 and previous config saved to /var/cache/conftool/dbconfig/20230828-180519-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51732 and previous config saved to /var/cache/conftool/dbconfig/20230828-175911-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51731 and previous config saved to /var/cache/conftool/dbconfig/20230828-175630-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51730 and previous config saved to /var/cache/conftool/dbconfig/20230828-175151-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51729 and previous config saved to /var/cache/conftool/dbconfig/20230828-175019-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51728 and previous config saved to /var/cache/conftool/dbconfig/20230828-175013-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51727 and previous config saved to /var/cache/conftool/dbconfig/20230828-174954-ladsgroup.json
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51726 and previous config saved to /var/cache/conftool/dbconfig/20230828-174404-ladsgroup.json
  • 17:39 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031) (duration: 07m 41s)
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51725 and previous config saved to /var/cache/conftool/dbconfig/20230828-173726-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51724 and previous config saved to /var/cache/conftool/dbconfig/20230828-173650-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P51723 and previous config saved to /var/cache/conftool/dbconfig/20230828-173645-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51722 and previous config saved to /var/cache/conftool/dbconfig/20230828-173506-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P51721 and previous config saved to /var/cache/conftool/dbconfig/20230828-173448-ladsgroup.json
  • 17:34 taavi@deploy1002: taavi: Continuing with sync
  • 17:33 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD opti
  • 17:32 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031)
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51720 and previous config saved to /var/cache/conftool/dbconfig/20230828-172858-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51719 and previous config saved to /var/cache/conftool/dbconfig/20230828-172143-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P51718 and previous config saved to /var/cache/conftool/dbconfig/20230828-172138-ladsgroup.json
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P51717 and previous config saved to /var/cache/conftool/dbconfig/20230828-171942-ladsgroup.json
  • 17:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51716 and previous config saved to /var/cache/conftool/dbconfig/20230828-170637-ladsgroup.json
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51715 and previous config saved to /var/cache/conftool/dbconfig/20230828-170632-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51714 and previous config saved to /var/cache/conftool/dbconfig/20230828-170435-ladsgroup.json
  • 17:00 inflatador: bking@cumin1001 depool wdqs1005 for decom T344198
  • 17:00 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs1005.eqiad.wmnet
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51713 and previous config saved to /var/cache/conftool/dbconfig/20230828-165906-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51712 and previous config saved to /var/cache/conftool/dbconfig/20230828-165846-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51711 and previous config saved to /var/cache/conftool/dbconfig/20230828-165839-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51710 and previous config saved to /var/cache/conftool/dbconfig/20230828-165824-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51709 and previous config saved to /var/cache/conftool/dbconfig/20230828-165730-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51708 and previous config saved to /var/cache/conftool/dbconfig/20230828-165706-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51707 and previous config saved to /var/cache/conftool/dbconfig/20230828-165131-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51706 and previous config saved to /var/cache/conftool/dbconfig/20230828-164359-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51705 and previous config saved to /var/cache/conftool/dbconfig/20230828-164349-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P51704 and previous config saved to /var/cache/conftool/dbconfig/20230828-164332-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51703 and previous config saved to /var/cache/conftool/dbconfig/20230828-164318-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P51702 and previous config saved to /var/cache/conftool/dbconfig/20230828-164200-ladsgroup.json
  • 16:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51701 and previous config saved to /var/cache/conftool/dbconfig/20230828-162843-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P51700 and previous config saved to /var/cache/conftool/dbconfig/20230828-162826-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51699 and previous config saved to /var/cache/conftool/dbconfig/20230828-162812-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P51698 and previous config saved to /var/cache/conftool/dbconfig/20230828-162654-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51697 and previous config saved to /var/cache/conftool/dbconfig/20230828-162005-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51696 and previous config saved to /var/cache/conftool/dbconfig/20230828-161406-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51695 and previous config saved to /var/cache/conftool/dbconfig/20230828-161345-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51694 and previous config saved to /var/cache/conftool/dbconfig/20230828-161337-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51693 and previous config saved to /var/cache/conftool/dbconfig/20230828-161320-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51692 and previous config saved to /var/cache/conftool/dbconfig/20230828-161306-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51691 and previous config saved to /var/cache/conftool/dbconfig/20230828-161147-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51690 and previous config saved to /var/cache/conftool/dbconfig/20230828-160709-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51689 and previous config saved to /var/cache/conftool/dbconfig/20230828-160639-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51688 and previous config saved to /var/cache/conftool/dbconfig/20230828-160546-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51687 and previous config saved to /var/cache/conftool/dbconfig/20230828-160522-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P51686 and previous config saved to /var/cache/conftool/dbconfig/20230828-160459-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51685 and previous config saved to /var/cache/conftool/dbconfig/20230828-155839-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51684 and previous config saved to /var/cache/conftool/dbconfig/20230828-155830-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P51683 and previous config saved to /var/cache/conftool/dbconfig/20230828-155133-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P51682 and previous config saved to /var/cache/conftool/dbconfig/20230828-155016-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P51681 and previous config saved to /var/cache/conftool/dbconfig/20230828-154953-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51680 and previous config saved to /var/cache/conftool/dbconfig/20230828-154327-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51679 and previous config saved to /var/cache/conftool/dbconfig/20230828-153655-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51678 and previous config saved to /var/cache/conftool/dbconfig/20230828-153634-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P51677 and previous config saved to /var/cache/conftool/dbconfig/20230828-153627-ladsgroup.json
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P51676 and previous config saved to /var/cache/conftool/dbconfig/20230828-153510-ladsgroup.json
  • 15:35 fabfur: enable puppet and start pybal on lvs6001 (T344587)
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51675 and previous config saved to /var/cache/conftool/dbconfig/20230828-153447-ladsgroup.json
  • 15:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 15:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51674 and previous config saved to /var/cache/conftool/dbconfig/20230828-152948-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51673 and previous config saved to /var/cache/conftool/dbconfig/20230828-152925-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51672 and previous config saved to /var/cache/conftool/dbconfig/20230828-152820-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P51671 and previous config saved to /var/cache/conftool/dbconfig/20230828-152128-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51670 and previous config saved to /var/cache/conftool/dbconfig/20230828-152121-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51669 and previous config saved to /var/cache/conftool/dbconfig/20230828-152004-ladsgroup.json
  • 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 fabfur: disable puppet and stop pybal on lvs6001 for reboot (T344587)
  • 15:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51668 and previous config saved to /var/cache/conftool/dbconfig/20230828-151511-ladsgroup.json
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51667 and previous config saved to /var/cache/conftool/dbconfig/20230828-151446-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P51666 and previous config saved to /var/cache/conftool/dbconfig/20230828-151418-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51665 and previous config saved to /var/cache/conftool/dbconfig/20230828-151300-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51664 and previous config saved to /var/cache/conftool/dbconfig/20230828-151236-ladsgroup.json
  • 15:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P51663 and previous config saved to /var/cache/conftool/dbconfig/20230828-150622-ladsgroup.json
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:05 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P51662 and previous config saved to /var/cache/conftool/dbconfig/20230828-145940-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51661 and previous config saved to /var/cache/conftool/dbconfig/20230828-145921-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P51660 and previous config saved to /var/cache/conftool/dbconfig/20230828-145912-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P51659 and previous config saved to /var/cache/conftool/dbconfig/20230828-145730-ladsgroup.json
  • 14:55 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 14:54 claime: bounced ferm.service on ml-serve1008
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51658 and previous config saved to /var/cache/conftool/dbconfig/20230828-145116-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51657 and previous config saved to /var/cache/conftool/dbconfig/20230828-144924-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51656 and previous config saved to /var/cache/conftool/dbconfig/20230828-144903-ladsgroup.json
  • 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P51655 and previous config saved to /var/cache/conftool/dbconfig/20230828-144433-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51654 and previous config saved to /var/cache/conftool/dbconfig/20230828-144406-ladsgroup.json
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P51653 and previous config saved to /var/cache/conftool/dbconfig/20230828-144224-ladsgroup.json
  • 14:40 fabfur: enable puppet and start pybal on lvs6002 (T344587)
  • 14:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51652 and previous config saved to /var/cache/conftool/dbconfig/20230828-143808-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 14:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51651 and previous config saved to /var/cache/conftool/dbconfig/20230828-143453-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51650 and previous config saved to /var/cache/conftool/dbconfig/20230828-143357-ladsgroup.json
  • 14:32 bblack: esams cp clusters: rolling restarts of varnish-frontend ~1h apart over the next ~8h, to apply memory sizing change from: https://gerrit.wikimedia.org/r/c/operations/puppet/+/952866/ (earlier run only did 1 host per cluster before we changed direction!)
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1027.eqiad.wmnet
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1027.eqiad.wmnet
  • 14:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51649 and previous config saved to /var/cache/conftool/dbconfig/20230828-142927-ladsgroup.json
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51648 and previous config saved to /var/cache/conftool/dbconfig/20230828-142718-ladsgroup.json
  • 14:25 claime: bounced ferm.service on ml-serve1007
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51647 and previous config saved to /var/cache/conftool/dbconfig/20230828-142105-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51646 and previous config saved to /var/cache/conftool/dbconfig/20230828-142056-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T344589)', diff saved to https://phabricator.wikimedia.org/P51645 and previous config saved to /var/cache/conftool/dbconfig/20230828-142034-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51644 and previous config saved to /var/cache/conftool/dbconfig/20230828-142033-ladsgroup.json
  • 14:20 fabfur: disable puppet and stop pybal on lvs6002 for reboot (T344587)
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030', diff saved to https://phabricator.wikimedia.org/P51643 and previous config saved to /var/cache/conftool/dbconfig/20230828-141946-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51642 and previous config saved to /var/cache/conftool/dbconfig/20230828-141851-ladsgroup.json
  • 14:16 fabfur: enable puppet and start pybal on lvs6003 (T344587)
  • 14:16 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51641 and previous config saved to /var/cache/conftool/dbconfig/20230828-141505-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 14:12 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 14:11 fabfur: disable puppet and stop pybal on lvs6003 for reboot (T344587)
  • 14:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:07 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P51640 and previous config saved to /var/cache/conftool/dbconfig/20230828-140528-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P51639 and previous config saved to /var/cache/conftool/dbconfig/20230828-140527-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030', diff saved to https://phabricator.wikimedia.org/P51638 and previous config saved to /var/cache/conftool/dbconfig/20230828-140440-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51637 and previous config saved to /var/cache/conftool/dbconfig/20230828-140345-ladsgroup.json
  • 14:02 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:02 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:51 moritzm: bounce ferm on ml-serve1006
  • 13:50 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P51636 and previous config saved to /var/cache/conftool/dbconfig/20230828-135021-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P51635 and previous config saved to /var/cache/conftool/dbconfig/20230828-135021-ladsgroup.json
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51634 and previous config saved to /var/cache/conftool/dbconfig/20230828-134934-ladsgroup.json
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51633 and previous config saved to /var/cache/conftool/dbconfig/20230828-134137-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51632 and previous config saved to /var/cache/conftool/dbconfig/20230828-133756-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51631 and previous config saved to /var/cache/conftool/dbconfig/20230828-133514-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51630 and previous config saved to /var/cache/conftool/dbconfig/20230828-133040-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1030.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1030.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51629 and previous config saved to /var/cache/conftool/dbconfig/20230828-133016-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51628 and previous config saved to /var/cache/conftool/dbconfig/20230828-132724-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51627 and previous config saved to /var/cache/conftool/dbconfig/20230828-132703-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51626 and previous config saved to /var/cache/conftool/dbconfig/20230828-132655-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T344589)', diff saved to https://phabricator.wikimedia.org/P51625 and previous config saved to /var/cache/conftool/dbconfig/20230828-132648-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51624 and previous config saved to /var/cache/conftool/dbconfig/20230828-132632-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51623 and previous config saved to /var/cache/conftool/dbconfig/20230828-132623-ladsgroup.json
  • 13:24 fabfur: enable puppet and start pybal on lvs5004 (T344587)
  • 13:23 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51622 and previous config saved to /var/cache/conftool/dbconfig/20230828-132250-ladsgroup.json
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 13:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P51621 and previous config saved to /var/cache/conftool/dbconfig/20230828-131510-ladsgroup.json
  • 13:14 urbanecm@deploy1002: Finished scap: Backport for Revert "ltwiki: Disable Growth features" (T344013) (duration: 09m 04s)
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51620 and previous config saved to /var/cache/conftool/dbconfig/20230828-131157-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P51619 and previous config saved to /var/cache/conftool/dbconfig/20230828-131125-ladsgroup.json
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P51618 and previous config saved to /var/cache/conftool/dbconfig/20230828-131117-ladsgroup.json
  • 13:08 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51617 and previous config saved to /var/cache/conftool/dbconfig/20230828-130744-ladsgroup.json
  • 13:06 urbanecm@deploy1002: urbanecm: Backport for Revert "ltwiki: Disable Growth features" (T344013) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Revert "ltwiki: Disable Growth features" (T344013)
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 13:01 bblack: esams cp clusters: rolling restarts of varnish-frontend ~1h apart over the next ~8h, to apply memory sizing change from: https://gerrit.wikimedia.org/r/c/operations/puppet/+/952555/
  • 13:01 fabfur: disable puppet and stop pybal on lvs5004 for reboot (T344587)
  • 13:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51616 and previous config saved to /var/cache/conftool/dbconfig/20230828-130012-ladsgroup.json
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P51615 and previous config saved to /var/cache/conftool/dbconfig/20230828-130004-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51614 and previous config saved to /var/cache/conftool/dbconfig/20230828-125651-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P51613 and previous config saved to /var/cache/conftool/dbconfig/20230828-125619-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P51612 and previous config saved to /var/cache/conftool/dbconfig/20230828-125610-ladsgroup.json
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51611 and previous config saved to /var/cache/conftool/dbconfig/20230828-125237-ladsgroup.json
  • 12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1002
  • 12:45 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1002
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51610 and previous config saved to /var/cache/conftool/dbconfig/20230828-124506-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51609 and previous config saved to /var/cache/conftool/dbconfig/20230828-124457-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51608 and previous config saved to /var/cache/conftool/dbconfig/20230828-124145-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51607 and previous config saved to /var/cache/conftool/dbconfig/20230828-124113-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51606 and previous config saved to /var/cache/conftool/dbconfig/20230828-124104-ladsgroup.json
  • 12:40 fabfur: enable puppet and start pybal on lvs5005 (T344587)
  • 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add puppetserver1002 - jbond@cumin1001"
  • 12:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
  • 12:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add puppetserver1002 - jbond@cumin1001"
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51605 and previous config saved to /var/cache/conftool/dbconfig/20230828-123917-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51604 and previous config saved to /var/cache/conftool/dbconfig/20230828-123847-ladsgroup.json
  • 12:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51603 and previous config saved to /var/cache/conftool/dbconfig/20230828-123452-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51602 and previous config saved to /var/cache/conftool/dbconfig/20230828-123444-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:34 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device cloudsw1-b1-codfw
  • 12:34 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:33 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:33 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51601 and previous config saved to /var/cache/conftool/dbconfig/20230828-123004-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51600 and previous config saved to /var/cache/conftool/dbconfig/20230828-123000-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 12:29 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51599 and previous config saved to /var/cache/conftool/dbconfig/20230828-122341-ladsgroup.json
  • 12:20 fabfur: disable puppet and stop pybal on lvs5005 for reboot (T344587)
  • 12:18 fabfur: enable puppet and start pybal on lvs5006 (T344587)
  • 12:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 12:15 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 12:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51598 and previous config saved to /var/cache/conftool/dbconfig/20230828-121454-ladsgroup.json
  • 12:14 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 12:14 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 12:14 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 12:13 fabfur: disable puppet and stop pybal on lvs5006 for reboot (T344587)
  • 12:13 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 12:12 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 12:11 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 12:11 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51597 and previous config saved to /var/cache/conftool/dbconfig/20230828-120835-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51596 and previous config saved to /var/cache/conftool/dbconfig/20230828-120530-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51595 and previous config saved to /var/cache/conftool/dbconfig/20230828-120509-ladsgroup.json
  • 12:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1006.eqiad.wmnet
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51594 and previous config saved to /var/cache/conftool/dbconfig/20230828-115328-ladsgroup.json
  • 11:53 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 11:50 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51593 and previous config saved to /var/cache/conftool/dbconfig/20230828-115003-ladsgroup.json
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51592 and previous config saved to /var/cache/conftool/dbconfig/20230828-114706-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51591 and previous config saved to /var/cache/conftool/dbconfig/20230828-114642-ladsgroup.json
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51590 and previous config saved to /var/cache/conftool/dbconfig/20230828-114556-ladsgroup.json
  • 11:44 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1006.eqiad.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51589 and previous config saved to /var/cache/conftool/dbconfig/20230828-113733-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51588 and previous config saved to /var/cache/conftool/dbconfig/20230828-113712-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51587 and previous config saved to /var/cache/conftool/dbconfig/20230828-113455-ladsgroup.json
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51585 and previous config saved to /var/cache/conftool/dbconfig/20230828-113136-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51584 and previous config saved to /var/cache/conftool/dbconfig/20230828-113050-ladsgroup.json
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51583 and previous config saved to /var/cache/conftool/dbconfig/20230828-112206-ladsgroup.json
  • 11:20 fabfur: enable puppet and start pybal on lvs4009 (T344587)
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51582 and previous config saved to /var/cache/conftool/dbconfig/20230828-111949-ladsgroup.json
  • 11:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51581 and previous config saved to /var/cache/conftool/dbconfig/20230828-111630-ladsgroup.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51580 and previous config saved to /var/cache/conftool/dbconfig/20230828-111544-ladsgroup.json
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:15 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
  • 11:15 kamila@deploy1002: Finished scap: base image update due to T344991 (duration: 09m 31s)
  • 11:11 moritzm: bounce ferm on ml-serve10001
  • 11:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51579 and previous config saved to /var/cache/conftool/dbconfig/20230828-110700-ladsgroup.json
  • 11:05 kamila@deploy1002: Started scap: base image update due to T344991
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51578 and previous config saved to /var/cache/conftool/dbconfig/20230828-110124-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51577 and previous config saved to /var/cache/conftool/dbconfig/20230828-110038-ladsgroup.json
  • 10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on restbase1027.eqiad.wmnet with reason: T345058 - service probes flapping
  • 10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on restbase1027.eqiad.wmnet with reason: T345058 - service probes flapping
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 10:55 fabfur: disable puppet and stop pybal on lvs4009 for reboot (T344587)
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51576 and previous config saved to /var/cache/conftool/dbconfig/20230828-105503-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51575 and previous config saved to /var/cache/conftool/dbconfig/20230828-105407-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51574 and previous config saved to /var/cache/conftool/dbconfig/20230828-105342-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51573 and previous config saved to /var/cache/conftool/dbconfig/20230828-105153-ladsgroup.json
  • 10:50 moritzm: installing exim4 bugfix updates from Bookworm point release
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51572 and previous config saved to /var/cache/conftool/dbconfig/20230828-105002-ladsgroup.json
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
  • 10:46 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51571 and previous config saved to /var/cache/conftool/dbconfig/20230828-104407-ladsgroup.json
  • 10:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:39 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51570 and previous config saved to /var/cache/conftool/dbconfig/20230828-103836-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51569 and previous config saved to /var/cache/conftool/dbconfig/20230828-103827-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51568 and previous config saved to /var/cache/conftool/dbconfig/20230828-103826-ladsgroup.json
  • 10:37 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:35 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51567 and previous config saved to /var/cache/conftool/dbconfig/20230828-103456-ladsgroup.json
  • 10:31 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 10:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51566 and previous config saved to /var/cache/conftool/dbconfig/20230828-102330-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51565 and previous config saved to /var/cache/conftool/dbconfig/20230828-102320-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027', diff saved to https://phabricator.wikimedia.org/P51564 and previous config saved to /var/cache/conftool/dbconfig/20230828-102320-ladsgroup.json
  • 10:23 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 10:23 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 10:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51563 and previous config saved to /var/cache/conftool/dbconfig/20230828-101949-ladsgroup.json
  • 10:17 fabfur: enable puppet and start pybal on lvs4008 for reboot (T344587)
  • 10:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
  • 10:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51562 and previous config saved to /var/cache/conftool/dbconfig/20230828-101426-ladsgroup.json
  • 10:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51561 and previous config saved to /var/cache/conftool/dbconfig/20230828-101405-ladsgroup.json
  • 10:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
  • 10:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51560 and previous config saved to /var/cache/conftool/dbconfig/20230828-101238-ladsgroup.json
  • 10:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:11 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 10:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 10:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:10 claime: Deploying 952812 for T344814 to mw-debug and mw-api-ext
  • 10:10 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 10:09 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51559 and previous config saved to /var/cache/conftool/dbconfig/20230828-100823-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51558 and previous config saved to /var/cache/conftool/dbconfig/20230828-100814-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027', diff saved to https://phabricator.wikimedia.org/P51557 and previous config saved to /var/cache/conftool/dbconfig/20230828-100814-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51556 and previous config saved to /var/cache/conftool/dbconfig/20230828-100443-ladsgroup.json
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51555 and previous config saved to /var/cache/conftool/dbconfig/20230828-100045-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51554 and previous config saved to /var/cache/conftool/dbconfig/20230828-100005-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51553 and previous config saved to /var/cache/conftool/dbconfig/20230828-095859-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51552 and previous config saved to /var/cache/conftool/dbconfig/20230828-095732-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51551 and previous config saved to /var/cache/conftool/dbconfig/20230828-095722-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51550 and previous config saved to /var/cache/conftool/dbconfig/20230828-095658-ladsgroup.json
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 09:54 fabfur: disable puppet and stop pybal on lvs4008 for reboot (T344587)
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51549 and previous config saved to /var/cache/conftool/dbconfig/20230828-095308-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51548 and previous config saved to /var/cache/conftool/dbconfig/20230828-095308-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51547 and previous config saved to /var/cache/conftool/dbconfig/20230828-094813-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51546 and previous config saved to /var/cache/conftool/dbconfig/20230828-094748-ladsgroup.json
  • 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51545 and previous config saved to /var/cache/conftool/dbconfig/20230828-094650-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 09:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51544 and previous config saved to /var/cache/conftool/dbconfig/20230828-094626-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51543 and previous config saved to /var/cache/conftool/dbconfig/20230828-094458-ladsgroup.json
  • 09:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51542 and previous config saved to /var/cache/conftool/dbconfig/20230828-094353-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51541 and previous config saved to /var/cache/conftool/dbconfig/20230828-094220-ladsgroup.json
  • 09:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51540 and previous config saved to /var/cache/conftool/dbconfig/20230828-094152-ladsgroup.json
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 09:41 fabfur: ignore previous message: s/codfw/ulsfo/
  • 09:39 fabfur: begin rebooting lvs hosts in codfw (T344587)
  • 09:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P51539 and previous config saved to /var/cache/conftool/dbconfig/20230828-093242-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51538 and previous config saved to /var/cache/conftool/dbconfig/20230828-093120-ladsgroup.json
  • 09:30 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51537 and previous config saved to /var/cache/conftool/dbconfig/20230828-092952-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51536 and previous config saved to /var/cache/conftool/dbconfig/20230828-092847-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51535 and previous config saved to /var/cache/conftool/dbconfig/20230828-092713-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51534 and previous config saved to /var/cache/conftool/dbconfig/20230828-092646-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P51533 and previous config saved to /var/cache/conftool/dbconfig/20230828-091735-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51532 and previous config saved to /var/cache/conftool/dbconfig/20230828-091613-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51531 and previous config saved to /var/cache/conftool/dbconfig/20230828-091446-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51530 and previous config saved to /var/cache/conftool/dbconfig/20230828-091140-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51529 and previous config saved to /var/cache/conftool/dbconfig/20230828-090819-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51528 and previous config saved to /var/cache/conftool/dbconfig/20230828-090749-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51527 and previous config saved to /var/cache/conftool/dbconfig/20230828-090318-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51526 and previous config saved to /var/cache/conftool/dbconfig/20230828-090255-ladsgroup.json
  • 09:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51525 and previous config saved to /var/cache/conftool/dbconfig/20230828-090229-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51524 and previous config saved to /var/cache/conftool/dbconfig/20230828-090107-ladsgroup.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51523 and previous config saved to /var/cache/conftool/dbconfig/20230828-085737-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51522 and previous config saved to /var/cache/conftool/dbconfig/20230828-085456-ladsgroup.json
  • 08:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51521 and previous config saved to /var/cache/conftool/dbconfig/20230828-085432-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51520 and previous config saved to /var/cache/conftool/dbconfig/20230828-085243-ladsgroup.json
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51519 and previous config saved to /var/cache/conftool/dbconfig/20230828-084947-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51518 and previous config saved to /var/cache/conftool/dbconfig/20230828-084926-ladsgroup.json
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51517 and previous config saved to /var/cache/conftool/dbconfig/20230828-084748-ladsgroup.json
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51516 and previous config saved to /var/cache/conftool/dbconfig/20230828-084710-ladsgroup.json
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51515 and previous config saved to /var/cache/conftool/dbconfig/20230828-084650-ladsgroup.json
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51514 and previous config saved to /var/cache/conftool/dbconfig/20230828-083949-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51513 and previous config saved to /var/cache/conftool/dbconfig/20230828-083926-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51512 and previous config saved to /var/cache/conftool/dbconfig/20230828-083737-ladsgroup.json
  • 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51511 and previous config saved to /var/cache/conftool/dbconfig/20230828-083420-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51510 and previous config saved to /var/cache/conftool/dbconfig/20230828-083242-ladsgroup.json
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51509 and previous config saved to /var/cache/conftool/dbconfig/20230828-083143-ladsgroup.json
  • 08:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P51508 and previous config saved to /var/cache/conftool/dbconfig/20230828-082443-ladsgroup.json
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51507 and previous config saved to /var/cache/conftool/dbconfig/20230828-082420-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51506 and previous config saved to /var/cache/conftool/dbconfig/20230828-082231-ladsgroup.json
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 08:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51505 and previous config saved to /var/cache/conftool/dbconfig/20230828-081913-ladsgroup.json
  • 08:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51504 and previous config saved to /var/cache/conftool/dbconfig/20230828-081736-ladsgroup.json
  • 08:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51503 and previous config saved to /var/cache/conftool/dbconfig/20230828-081637-ladsgroup.json
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51502 and previous config saved to /var/cache/conftool/dbconfig/20230828-081245-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51501 and previous config saved to /var/cache/conftool/dbconfig/20230828-081220-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51500 and previous config saved to /var/cache/conftool/dbconfig/20230828-081117-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51499 and previous config saved to /var/cache/conftool/dbconfig/20230828-081051-ladsgroup.json
  • 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P51498 and previous config saved to /var/cache/conftool/dbconfig/20230828-080936-ladsgroup.json
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51497 and previous config saved to /var/cache/conftool/dbconfig/20230828-080914-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51496 and previous config saved to /var/cache/conftool/dbconfig/20230828-080407-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51495 and previous config saved to /var/cache/conftool/dbconfig/20230828-080131-ladsgroup.json
  • 08:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51494 and previous config saved to /var/cache/conftool/dbconfig/20230828-080045-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 08:00 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51493 and previous config saved to /var/cache/conftool/dbconfig/20230828-080021-ladsgroup.json
  • 08:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:59 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51492 and previous config saved to /var/cache/conftool/dbconfig/20230828-075714-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51491 and previous config saved to /var/cache/conftool/dbconfig/20230828-075544-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51490 and previous config saved to /var/cache/conftool/dbconfig/20230828-075430-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51489 and previous config saved to /var/cache/conftool/dbconfig/20230828-075036-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51488 and previous config saved to /var/cache/conftool/dbconfig/20230828-075011-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51487 and previous config saved to /var/cache/conftool/dbconfig/20230828-074515-ladsgroup.json
  • 07:45 moritzm: fail over Ganeti master in codfw-test to ganeti-test2003
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51486 and previous config saved to /var/cache/conftool/dbconfig/20230828-074208-ladsgroup.json
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51485 and previous config saved to /var/cache/conftool/dbconfig/20230828-074038-ladsgroup.json
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P51484 and previous config saved to /var/cache/conftool/dbconfig/20230828-073505-ladsgroup.json
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
  • 07:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51483 and previous config saved to /var/cache/conftool/dbconfig/20230828-073009-ladsgroup.json
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51482 and previous config saved to /var/cache/conftool/dbconfig/20230828-072701-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51481 and previous config saved to /var/cache/conftool/dbconfig/20230828-072644-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51480 and previous config saved to /var/cache/conftool/dbconfig/20230828-072623-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51479 and previous config saved to /var/cache/conftool/dbconfig/20230828-072532-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51478 and previous config saved to /var/cache/conftool/dbconfig/20230828-072422-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 07:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51477 and previous config saved to /var/cache/conftool/dbconfig/20230828-072025-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51476 and previous config saved to /var/cache/conftool/dbconfig/20230828-072000-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P51475 and previous config saved to /var/cache/conftool/dbconfig/20230828-071959-ladsgroup.json
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5003.wikimedia.org
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51474 and previous config saved to /var/cache/conftool/dbconfig/20230828-071824-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51473 and previous config saved to /var/cache/conftool/dbconfig/20230828-071800-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51472 and previous config saved to /var/cache/conftool/dbconfig/20230828-071503-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P51471 and previous config saved to /var/cache/conftool/dbconfig/20230828-071117-ladsgroup.json
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5003.wikimedia.org
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51470 and previous config saved to /var/cache/conftool/dbconfig/20230828-070847-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51469 and previous config saved to /var/cache/conftool/dbconfig/20230828-070823-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51468 and previous config saved to /var/cache/conftool/dbconfig/20230828-070453-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51467 and previous config saved to /var/cache/conftool/dbconfig/20230828-070254-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51466 and previous config saved to /var/cache/conftool/dbconfig/20230828-065958-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 06:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 06:59 moritzm: installing perf updates on bullseye hosts
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P51465 and previous config saved to /var/cache/conftool/dbconfig/20230828-065611-ladsgroup.json
  • 06:55 moritzm: installing nftables bugfix update from bullseye point release
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51464 and previous config saved to /var/cache/conftool/dbconfig/20230828-065316-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51463 and previous config saved to /var/cache/conftool/dbconfig/20230828-064948-ladsgroup.json
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51462 and previous config saved to /var/cache/conftool/dbconfig/20230828-064748-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51461 and previous config saved to /var/cache/conftool/dbconfig/20230828-064105-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51460 and previous config saved to /var/cache/conftool/dbconfig/20230828-063810-ladsgroup.json
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51459 and previous config saved to /var/cache/conftool/dbconfig/20230828-063442-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51458 and previous config saved to /var/cache/conftool/dbconfig/20230828-063242-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51457 and previous config saved to /var/cache/conftool/dbconfig/20230828-062805-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51456 and previous config saved to /var/cache/conftool/dbconfig/20230828-062740-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51455 and previous config saved to /var/cache/conftool/dbconfig/20230828-062617-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51454 and previous config saved to /var/cache/conftool/dbconfig/20230828-062552-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51453 and previous config saved to /var/cache/conftool/dbconfig/20230828-062304-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51452 and previous config saved to /var/cache/conftool/dbconfig/20230828-061651-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51451 and previous config saved to /var/cache/conftool/dbconfig/20230828-061627-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51450 and previous config saved to /var/cache/conftool/dbconfig/20230828-061233-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51449 and previous config saved to /var/cache/conftool/dbconfig/20230828-061046-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51448 and previous config saved to /var/cache/conftool/dbconfig/20230828-060317-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51447 and previous config saved to /var/cache/conftool/dbconfig/20230828-060121-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51446 and previous config saved to /var/cache/conftool/dbconfig/20230828-055751-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51445 and previous config saved to /var/cache/conftool/dbconfig/20230828-055727-ladsgroup.json
  • 05:56 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to old extlinks columns in s4 (T342683) (duration: 15m 36s)
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51444 and previous config saved to /var/cache/conftool/dbconfig/20230828-055539-ladsgroup.json
  • 05:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 05:49 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to old extlinks columns in s4 (T342683) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51443 and previous config saved to /var/cache/conftool/dbconfig/20230828-054615-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51442 and previous config saved to /var/cache/conftool/dbconfig/20230828-054247-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51441 and previous config saved to /var/cache/conftool/dbconfig/20230828-054221-ladsgroup.json
  • 05:41 marostegui: failover m5-master to dbproxy1021
  • 05:41 ladsgroup@deploy1002: Started scap: Backport for Stop writing to old extlinks columns in s4 (T342683)
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51440 and previous config saved to /var/cache/conftool/dbconfig/20230828-054033-ladsgroup.json
  • 05:34 elukey: powercycle restbase1027 - stopped publishing metrics days ago, no root tty available in mgmt console
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51439 and previous config saved to /var/cache/conftool/dbconfig/20230828-053108-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51438 and previous config saved to /var/cache/conftool/dbconfig/20230828-053045-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 05:30 elukey: depool restbase1027 - a lot of ping down events registered, a check up is needed
  • 05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51437 and previous config saved to /var/cache/conftool/dbconfig/20230828-052742-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51436 and previous config saved to /var/cache/conftool/dbconfig/20230828-052610-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51435 and previous config saved to /var/cache/conftool/dbconfig/20230828-051349-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51434 and previous config saved to /var/cache/conftool/dbconfig/20230828-051237-ladsgroup.json

2023-08-27

  • 07:28 elukey: silence rdb1011:6380's Redis alert (ORES-related) for 30 days to avoid spam

2023-08-26

  • 13:07 elukey: silence rdb1011:6378's Redis alert (ORES-related) for 30 days to avoid spam

2023-08-25

  • 21:03 inflatador: bking@cumin1001 shutting off wdqs1005 in preparation for decommission T344198
  • 21:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs1005.eqiad.wmnet with reason: to be decommissioned soon
  • 21:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs1005.eqiad.wmnet with reason: to be decommissioned soon
  • 19:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 19:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 18:48 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:46 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:45 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:45 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:26 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:15 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:14 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:14 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 17:45 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:45 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 17:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be2003']
  • 17:45 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 17:43 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:43 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:41 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:41 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:39 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: push deploy after bullseye reimage T343124 (duration: 00m 19s)
  • 17:39 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: push deploy after bullseye reimage T343124
  • 17:36 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1009.eqiad.wmnet with OS bullseye
  • 17:30 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 17:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 17:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: host reimage
  • 17:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: host reimage
  • 17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 17:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 17:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1009.eqiad.wmnet with OS bullseye
  • 17:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:41 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2048.codfw.wmnet with OS bullseye
  • 16:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host moss-be2003 to CODFW - jhancock@cumin2002"
  • 16:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host moss-be2003 to CODFW - jhancock@cumin2002"
  • 16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2048.codfw.wmnet with reason: host reimage
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2048.codfw.wmnet with reason: host reimage
  • 14:52 sukhe: force run agent on A:cp-esams
  • 14:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2048.codfw.wmnet with OS bullseye
  • 14:31 claime: powercycled kubernetes2009
  • 13:40 moritzm: installing w3m security updates
  • 13:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:32 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1001.eqiad.wmnet
  • 12:32 moritzm: imported wmf-laptop 0.5.8 to apt.wikimedia.org
  • 12:27 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
  • 12:20 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
  • 12:20 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
  • 12:19 jbond: disable puppet fleet wide for reboots
  • 12:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1006.eqiad.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 12:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1006.eqiad.wmnet
  • 12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 11:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 11:39 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 11:37 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 11:29 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:29 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab-test1001.eqiad.wmnet
  • 11:24 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host phab-test1001.eqiad.wmnet
  • 11:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:16 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
  • 11:16 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:10 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
  • 11:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:10 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:10 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:00 fabfur: enabled puppet and pybal on lvs2011 (T344587)
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:59 claime: Deploying mediawiki: Add missing controls for php-fpm - T341320
  • 10:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2011.codfw.wmnet
  • 10:56 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2011.codfw.wmnet
  • 10:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:48 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:38 fabfur: disabled puppet and pybal on lvs2011 for reboot (T344587)
  • 10:28 kartik@deploy1002: Finished scap: Backport for ext.uls.interface.js: Inline isNamed() method (T344635) (duration: 14m 06s)
  • 10:22 kartik@deploy1002: abi and kartik: Continuing with sync
  • 10:15 kartik@deploy1002: abi and kartik: Backport for ext.uls.interface.js: Inline isNamed() method (T344635) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:14 kartik@deploy1002: Started scap: Backport for ext.uls.interface.js: Inline isNamed() method (T344635)
  • 10:10 fabfur: enabled puppet and pybal on lvs2012 (T344587)
  • 10:09 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2012.codfw.wmnet
  • 10:09 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 10:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2012.codfw.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 09:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:57 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org
  • 09:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org
  • 09:47 fabfur: disabling puppet and pybal on lvs2012 for reboot (T344587)
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:34 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 09:28 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 09:24 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 09:24 fabfur: enabled puppet and pybal on lvs2013 for reboot (T344587)
  • 09:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 09:19 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 09:14 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 09:14 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 09:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast5004.wikimedia.org
  • 09:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:08 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:08 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
  • 09:08 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast6003.wikimedia.org
  • 09:07 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast6003.wikimedia.org with OS bookworm
  • 09:07 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:07 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:07 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast5004.wikimedia.org
  • 09:05 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:02 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:49 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:47 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:46 fabfur: stopping puppet and pybal on lvs2013 for reboot (T344587)
  • 08:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:44 claime: mw-debug: Remove limits for tls-proxy container - T344814
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast6003.wikimedia.org with OS bookworm
  • 08:42 jnuche@deploy1002: Installation of scap version "4.58.0" completed for 1 hosts
  • 08:41 jnuche@deploy1002: Installing scap version "4.58.0" for 1 hosts
  • 08:40 fabfur: started puppet and pybal on lvs2014 (T344587)
  • 08:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 08:36 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast6003.wikimedia.org on all recursors
  • 08:35 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6003.wikimedia.org on all recursors
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:33 fabfur: stopping puppet and pybal on lvs2014 for reboot (T344587)
  • 08:23 vgutierrez: re-enabling puppet on acme-chief clients
  • 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 08:16 vgutierrez: disabling puppet on acme-chief clients prior to acmechief2001 reboot
  • 08:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6003.wikimedia.org
  • 08:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast5004.wikimedia.org
  • 08:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast5004.wikimedia.org with OS bookworm
  • 08:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 07:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast5004.wikimedia.org with OS bookworm
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5004.wikimedia.org on all recursors
  • 07:10 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5004.wikimedia.org on all recursors
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:07 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:07 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:06 moritzm: installing cups security updates
  • 07:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5004.wikimedia.org
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 06:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51427 and previous config saved to /var/cache/conftool/dbconfig/20230825-054701-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51426 and previous config saved to /var/cache/conftool/dbconfig/20230825-053156-ladsgroup.json
  • 05:28 marostegui: failover m3-master to dbproxy1020
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51425 and previous config saved to /var/cache/conftool/dbconfig/20230825-051651-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51424 and previous config saved to /var/cache/conftool/dbconfig/20230825-050147-ladsgroup.json

2023-08-24

  • 23:10 bblack: geodns: DE+GB mapped back to esams (were temporarily on drmrs)
  • 22:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:29 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 00m 15s)
  • 21:29 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:28 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 08m 18s)
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51422 and previous config saved to /var/cache/conftool/dbconfig/20230824-212554-ladsgroup.json
  • 21:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:19 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:18 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 02m 17s)
  • 21:16 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:15 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: (no justification provided) (duration: 00m 55s)
  • 21:14 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: (no justification provided)
  • 21:14 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: (no justification provided) (duration: 00m 40s)
  • 21:14 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: (no justification provided)
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51421 and previous config saved to /var/cache/conftool/dbconfig/20230824-211048-ladsgroup.json
  • 21:09 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 02m 56s)
  • 21:06 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:06 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 22m 03s)
  • 21:01 thcipriani: mwmaint1002:foreachwiki extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --create-system-user # ref. 952132
  • 20:58 thcipriani@deploy1002: Finished scap: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870) (duration: 12m 31s)
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51419 and previous config saved to /var/cache/conftool/dbconfig/20230824-205541-ladsgroup.json
  • 20:52 thcipriani@deploy1002: thcipriani and jdrewniak and krinkle: Continuing with sync
  • 20:47 thcipriani@deploy1002: thcipriani and jdrewniak and krinkle: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (
  • 20:45 thcipriani@deploy1002: Started scap: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870)
  • 20:44 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51418 and previous config saved to /var/cache/conftool/dbconfig/20230824-204035-ladsgroup.json
  • 20:37 bking@deploy1002: Finished deploy [wdqs/wdqs@2455ffd]: (no justification provided) (duration: 04m 41s)
  • 20:34 inflatador: bking@deploy1002 'scap deploy new wdqs T343856'
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51417 and previous config saved to /var/cache/conftool/dbconfig/20230824-203322-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:33 bking@deploy1002: Started deploy [wdqs/wdqs@2455ffd]: (no justification provided)
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51416 and previous config saved to /var/cache/conftool/dbconfig/20230824-202836-ladsgroup.json
  • 20:21 thcipriani@deploy1002: Finished scap: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618) (duration: 09m 58s)
  • 20:15 thcipriani@deploy1002: thcipriani and matmarex: Continuing with sync
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51415 and previous config saved to /var/cache/conftool/dbconfig/20230824-201329-ladsgroup.json
  • 20:12 thcipriani@deploy1002: thcipriani and matmarex: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 thcipriani@deploy1002: Started scap: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618)
  • 20:03 effie: enabling puppet on thanos-fe* hosts
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51414 and previous config saved to /var/cache/conftool/dbconfig/20230824-195823-ladsgroup.json
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51412 and previous config saved to /var/cache/conftool/dbconfig/20230824-194317-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51411 and previous config saved to /var/cache/conftool/dbconfig/20230824-193458-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51410 and previous config saved to /var/cache/conftool/dbconfig/20230824-193434-ladsgroup.json
  • 19:30 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 19:30 effie: pool kartotherian to eqiad and depool from codfw
  • 19:22 cstone: payments-wiki upgraded from 7bf896f8 to b25307fe
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51409 and previous config saved to /var/cache/conftool/dbconfig/20230824-191928-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51408 and previous config saved to /var/cache/conftool/dbconfig/20230824-191322-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51407 and previous config saved to /var/cache/conftool/dbconfig/20230824-190422-ladsgroup.json
  • 19:03 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.23 refs T343725
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51406 and previous config saved to /var/cache/conftool/dbconfig/20230824-185816-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51405 and previous config saved to /var/cache/conftool/dbconfig/20230824-184915-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51404 and previous config saved to /var/cache/conftool/dbconfig/20230824-184308-ladsgroup.json
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51403 and previous config saved to /var/cache/conftool/dbconfig/20230824-182802-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51402 and previous config saved to /var/cache/conftool/dbconfig/20230824-182151-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51401 and previous config saved to /var/cache/conftool/dbconfig/20230824-182128-ladsgroup.json
  • 18:20 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51400 and previous config saved to /var/cache/conftool/dbconfig/20230824-182032-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51399 and previous config saved to /var/cache/conftool/dbconfig/20230824-182006-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51398 and previous config saved to /var/cache/conftool/dbconfig/20230824-180621-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51397 and previous config saved to /var/cache/conftool/dbconfig/20230824-180500-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51396 and previous config saved to /var/cache/conftool/dbconfig/20230824-175115-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51395 and previous config saved to /var/cache/conftool/dbconfig/20230824-174954-ladsgroup.json
  • 17:36 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51394 and previous config saved to /var/cache/conftool/dbconfig/20230824-173609-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51393 and previous config saved to /var/cache/conftool/dbconfig/20230824-173448-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51391 and previous config saved to /var/cache/conftool/dbconfig/20230824-172851-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51390 and previous config saved to /var/cache/conftool/dbconfig/20230824-172820-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51389 and previous config saved to /var/cache/conftool/dbconfig/20230824-172723-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51388 and previous config saved to /var/cache/conftool/dbconfig/20230824-172658-ladsgroup.json
  • 17:23 ryankemper: [WCQS] T344882 `ryankemper@wcqs1003:~$ sudo depool`
  • 17:17 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@15ed2de]: (no justification provided) (duration: 00m 19s)
  • 17:17 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@15ed2de]: (no justification provided)
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51387 and previous config saved to /var/cache/conftool/dbconfig/20230824-171314-ladsgroup.json
  • 17:13 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51386 and previous config saved to /var/cache/conftool/dbconfig/20230824-171152-ladsgroup.json
  • 17:10 bd808: Toolhub updated to a59d37
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:07 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:06 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51385 and previous config saved to /var/cache/conftool/dbconfig/20230824-165807-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51384 and previous config saved to /var/cache/conftool/dbconfig/20230824-165646-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51383 and previous config saved to /var/cache/conftool/dbconfig/20230824-165609-ladsgroup.json
  • 16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51382 and previous config saved to /var/cache/conftool/dbconfig/20230824-164301-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51381 and previous config saved to /var/cache/conftool/dbconfig/20230824-164140-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P51380 and previous config saved to /var/cache/conftool/dbconfig/20230824-164103-ladsgroup.json
  • 16:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51378 and previous config saved to /var/cache/conftool/dbconfig/20230824-163543-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51377 and previous config saved to /var/cache/conftool/dbconfig/20230824-163519-ladsgroup.json
  • 16:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1001.eqiad.wmnet
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51376 and previous config saved to /var/cache/conftool/dbconfig/20230824-163419-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51375 and previous config saved to /var/cache/conftool/dbconfig/20230824-163347-ladsgroup.json
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:30 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P51374 and previous config saved to /var/cache/conftool/dbconfig/20230824-162556-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51373 and previous config saved to /var/cache/conftool/dbconfig/20230824-162013-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51372 and previous config saved to /var/cache/conftool/dbconfig/20230824-161841-ladsgroup.json
  • 16:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-master1001.eqiad.wmnet
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51371 and previous config saved to /var/cache/conftool/dbconfig/20230824-161050-ladsgroup.json
  • 16:10 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad
  • 16:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230824-160502-ladsgroup.json
  • 16:04 sukhe: enable puppet on A:lvs and A:esams and force run agent to merge 952247
  • 16:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51369 and previous config saved to /var/cache/conftool/dbconfig/20230824-160335-ladsgroup.json
  • 16:00 sukhe: disable puppet on A:lvs and A:esams to merge 952247
  • 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51368 and previous config saved to /var/cache/conftool/dbconfig/20230824-154956-ladsgroup.json
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51367 and previous config saved to /var/cache/conftool/dbconfig/20230824-154829-ladsgroup.json
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51366 and previous config saved to /var/cache/conftool/dbconfig/20230824-154238-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51365 and previous config saved to /var/cache/conftool/dbconfig/20230824-154102-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51364 and previous config saved to /var/cache/conftool/dbconfig/20230824-154037-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51363 and previous config saved to /var/cache/conftool/dbconfig/20230824-153835-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51362 and previous config saved to /var/cache/conftool/dbconfig/20230824-153443-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51361 and previous config saved to /var/cache/conftool/dbconfig/20230824-153422-ladsgroup.json
  • 15:27 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:26 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51360 and previous config saved to /var/cache/conftool/dbconfig/20230824-152531-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51359 and previous config saved to /var/cache/conftool/dbconfig/20230824-152329-ladsgroup.json
  • 15:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 15:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:21 effie: depool kartotherian on eqiad
  • 15:20 oblivian@deploy1002: Started scap: (no justification provided)
  • 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P51358 and previous config saved to /var/cache/conftool/dbconfig/20230824-151916-ladsgroup.json
  • 15:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase1020.eqiad.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51357 and previous config saved to /var/cache/conftool/dbconfig/20230824-151414-ladsgroup.json
  • 15:13 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:13 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:13 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51356 and previous config saved to /var/cache/conftool/dbconfig/20230824-151025-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51355 and previous config saved to /var/cache/conftool/dbconfig/20230824-150823-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P51354 and previous config saved to /var/cache/conftool/dbconfig/20230824-150410-ladsgroup.json
  • 15:03 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
  • 15:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
  • 15:02 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:02 effie: pool kartotherian on codfw
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:esams and A:wikidough
  • 15:01 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51353 and previous config saved to /var/cache/conftool/dbconfig/20230824-145909-ladsgroup.json
  • 14:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51352 and previous config saved to /var/cache/conftool/dbconfig/20230824-145519-ladsgroup.json
  • 14:54 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
  • 14:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51351 and previous config saved to /var/cache/conftool/dbconfig/20230824-145317-ladsgroup.json
  • 14:53 moritzm: installing poppler security updates
  • 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51350 and previous config saved to /var/cache/conftool/dbconfig/20230824-144903-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51349 and previous config saved to /var/cache/conftool/dbconfig/20230824-144810-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51348 and previous config saved to /var/cache/conftool/dbconfig/20230824-144801-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51347 and previous config saved to /var/cache/conftool/dbconfig/20230824-144745-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51346 and previous config saved to /var/cache/conftool/dbconfig/20230824-144737-ladsgroup.json
  • 14:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51345 and previous config saved to /var/cache/conftool/dbconfig/20230824-144404-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51344 and previous config saved to /var/cache/conftool/dbconfig/20230824-143239-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51343 and previous config saved to /var/cache/conftool/dbconfig/20230824-143231-ladsgroup.json
  • 14:31 moritzm: restarting FPM on mw canaries
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51342 and previous config saved to /var/cache/conftool/dbconfig/20230824-142900-ladsgroup.json
  • 14:23 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:esams and A:wikidough
  • 14:21 moritzm: installing openssl security updates on buster
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51341 and previous config saved to /var/cache/conftool/dbconfig/20230824-141733-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51340 and previous config saved to /var/cache/conftool/dbconfig/20230824-141725-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51339 and previous config saved to /var/cache/conftool/dbconfig/20230824-141043-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51338 and previous config saved to /var/cache/conftool/dbconfig/20230824-141022-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51337 and previous config saved to /var/cache/conftool/dbconfig/20230824-140226-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51336 and previous config saved to /var/cache/conftool/dbconfig/20230824-140218-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51335 and previous config saved to /var/cache/conftool/dbconfig/20230824-135659-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51334 and previous config saved to /var/cache/conftool/dbconfig/20230824-135636-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P51333 and previous config saved to /var/cache/conftool/dbconfig/20230824-135516-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51332 and previous config saved to /var/cache/conftool/dbconfig/20230824-135456-ladsgroup.json
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:53 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51331 and previous config saved to /var/cache/conftool/dbconfig/20230824-135004-ladsgroup.json
  • 13:48 fabfur: enabled puppet and pybal on lvs1017 (T344587)
  • 13:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 13:46 bblack: cp3075: restart varnish frontend (changing malloc storage from https://gerrit.wikimedia.org/r/c/operations/puppet/+/952207/ )
  • 13:43 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:43 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 13:43 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51330 and previous config saved to /var/cache/conftool/dbconfig/20230824-134129-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P51329 and previous config saved to /var/cache/conftool/dbconfig/20230824-134010-ladsgroup.json
  • 13:37 marostegui: failover m2-master to dbproxy1023
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51328 and previous config saved to /var/cache/conftool/dbconfig/20230824-133458-ladsgroup.json
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51327 and previous config saved to /var/cache/conftool/dbconfig/20230824-132623-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51326 and previous config saved to /var/cache/conftool/dbconfig/20230824-132504-ladsgroup.json
  • 13:23 fabfur: disabling puppet and pybal on lvs1017 for reboot (T344587)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51325 and previous config saved to /var/cache/conftool/dbconfig/20230824-131952-ladsgroup.json
  • 13:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 00m 21s)
  • 13:11 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51324 and previous config saved to /var/cache/conftool/dbconfig/20230824-131117-ladsgroup.json
  • 13:08 bblack: cp3074: restart varnish frontend (changing malloc storage from https://gerrit.wikimedia.org/r/c/operations/puppet/+/952207/ )
  • 13:05 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 01m 27s)
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51323 and previous config saved to /var/cache/conftool/dbconfig/20230824-130519-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51322 and previous config saved to /var/cache/conftool/dbconfig/20230824-130455-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51321 and previous config saved to /var/cache/conftool/dbconfig/20230824-130446-ladsgroup.json
  • 13:04 fabfur: puppet and pybal reenabled on lvs1018 (T344587)
  • 13:04 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
  • 13:04 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 13:03 marostegui: failover m1-master to dbproxy1022
  • 13:02 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 12:59 sukhe: running homer "asw1-b*27-esams*" commit "add doh300[34]"
  • 12:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51320 and previous config saved to /var/cache/conftool/dbconfig/20230824-125607-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51319 and previous config saved to /var/cache/conftool/dbconfig/20230824-125542-ladsgroup.json
  • 12:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 12:49 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51318 and previous config saved to /var/cache/conftool/dbconfig/20230824-124942-ladsgroup.json
  • 12:48 effie: depool kartotherian in eqiad
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51317 and previous config saved to /var/cache/conftool/dbconfig/20230824-124758-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51316 and previous config saved to /var/cache/conftool/dbconfig/20230824-124737-ladsgroup.json
  • 12:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1001.eqiad.wmnet
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51315 and previous config saved to /var/cache/conftool/dbconfig/20230824-124036-ladsgroup.json
  • 12:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1001.eqiad.wmnet
  • 12:35 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:34 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51314 and previous config saved to /var/cache/conftool/dbconfig/20230824-123436-ladsgroup.json
  • 12:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P51313 and previous config saved to /var/cache/conftool/dbconfig/20230824-123231-ladsgroup.json
  • 12:25 fabfur: errata corrige: not lvs1020 but lvs1018
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51312 and previous config saved to /var/cache/conftool/dbconfig/20230824-122530-ladsgroup.json
  • 12:25 fabfur: disabling puppet and pybal on lvs1020 for reboot (T344587)
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51311 and previous config saved to /var/cache/conftool/dbconfig/20230824-121930-ladsgroup.json
  • 12:18 cgoubert@deploy1002: Finished scap: Redeploying mw-on-k8s - T344904 (duration: 02m 07s)
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P51310 and previous config saved to /var/cache/conftool/dbconfig/20230824-121725-ladsgroup.json
  • 12:16 cgoubert@deploy1002: Started scap: Redeploying mw-on-k8s - T344904
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51309 and previous config saved to /var/cache/conftool/dbconfig/20230824-121158-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51308 and previous config saved to /var/cache/conftool/dbconfig/20230824-121024-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs and A:cp
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51307 and previous config saved to /var/cache/conftool/dbconfig/20230824-120611-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51306 and previous config saved to /var/cache/conftool/dbconfig/20230824-120352-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs and A:cp
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51305 and previous config saved to /var/cache/conftool/dbconfig/20230824-120218-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P51304 and previous config saved to /var/cache/conftool/dbconfig/20230824-115105-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51303 and previous config saved to /var/cache/conftool/dbconfig/20230824-115056-ladsgroup.json
  • 11:49 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:48 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P51302 and previous config saved to /var/cache/conftool/dbconfig/20230824-113559-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P51301 and previous config saved to /var/cache/conftool/dbconfig/20230824-113550-ladsgroup.json
  • 11:31 taavi: foreachwikiindblist fishbowl extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php | tee oathauth-multiple-fishbowl.log # T242031
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51300 and previous config saved to /var/cache/conftool/dbconfig/20230824-112532-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51299 and previous config saved to /var/cache/conftool/dbconfig/20230824-112507-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51298 and previous config saved to /var/cache/conftool/dbconfig/20230824-112052-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P51297 and previous config saved to /var/cache/conftool/dbconfig/20230824-112043-ladsgroup.json
  • 11:16 fabfur: lvs1019 up and running (T344587)
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51296 and previous config saved to /var/cache/conftool/dbconfig/20230824-111432-ladsgroup.json
  • 11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51295 and previous config saved to /var/cache/conftool/dbconfig/20230824-111407-ladsgroup.json
  • 11:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P51294 and previous config saved to /var/cache/conftool/dbconfig/20230824-111001-ladsgroup.json
  • 11:09 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51293 and previous config saved to /var/cache/conftool/dbconfig/20230824-110537-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51292 and previous config saved to /var/cache/conftool/dbconfig/20230824-110226-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51291 and previous config saved to /var/cache/conftool/dbconfig/20230824-110206-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P51290 and previous config saved to /var/cache/conftool/dbconfig/20230824-105900-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P51289 and previous config saved to /var/cache/conftool/dbconfig/20230824-105454-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P51288 and previous config saved to /var/cache/conftool/dbconfig/20230824-104659-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P51287 and previous config saved to /var/cache/conftool/dbconfig/20230824-104354-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51286 and previous config saved to /var/cache/conftool/dbconfig/20230824-103948-ladsgroup.json
  • 10:32 fabfur: stopping pybal and rebooting lvs1019 (T344587)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P51285 and previous config saved to /var/cache/conftool/dbconfig/20230824-103153-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51284 and previous config saved to /var/cache/conftool/dbconfig/20230824-102848-ladsgroup.json
  • 10:22 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 10:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:22 effie: pool kartotherian on codfw
  • 10:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:20 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:19 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:18 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51283 and previous config saved to /var/cache/conftool/dbconfig/20230824-101647-ladsgroup.json
  • 10:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51282 and previous config saved to /var/cache/conftool/dbconfig/20230824-101527-ladsgroup.json
  • 10:15 effie: Disable puppet on thanos-fe (eqiad), rollout cfssl on thanos-fe in codfw
  • 10:14 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51281 and previous config saved to /var/cache/conftool/dbconfig/20230824-101437-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 10:14 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51280 and previous config saved to /var/cache/conftool/dbconfig/20230824-101405-ladsgroup.json
  • 10:08 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:08 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 10:03 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51279 and previous config saved to /var/cache/conftool/dbconfig/20230824-100321-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:03 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51278 and previous config saved to /var/cache/conftool/dbconfig/20230824-100259-ladsgroup.json
  • 10:02 fabfur: end reboot of lvs1020 (pybal service enabled) (T344587)
  • 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51277 and previous config saved to /var/cache/conftool/dbconfig/20230824-100021-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P51276 and previous config saved to /var/cache/conftool/dbconfig/20230824-095858-ladsgroup.json
  • 09:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:52 fabfur: reboot lvs1020 to apply patch (T344587)
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51275 and previous config saved to /var/cache/conftool/dbconfig/20230824-095117-ladsgroup.json
  • 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1002.eqiad.wmnet
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P51274 and previous config saved to /var/cache/conftool/dbconfig/20230824-094753-ladsgroup.json
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51273 and previous config saved to /var/cache/conftool/dbconfig/20230824-094515-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P51272 and previous config saved to /var/cache/conftool/dbconfig/20230824-094352-ladsgroup.json
  • 09:42 moritzm: removed stretch-wikimedia from apt.wikimedia.org (obsolete)
  • 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51271 and previous config saved to /var/cache/conftool/dbconfig/20230824-094109-ladsgroup.json
  • 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P51270 and previous config saved to /var/cache/conftool/dbconfig/20230824-093611-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P51269 and previous config saved to /var/cache/conftool/dbconfig/20230824-093247-ladsgroup.json
  • 09:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51268 and previous config saved to /var/cache/conftool/dbconfig/20230824-093008-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51267 and previous config saved to /var/cache/conftool/dbconfig/20230824-092846-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51266 and previous config saved to /var/cache/conftool/dbconfig/20230824-092636-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51265 and previous config saved to /var/cache/conftool/dbconfig/20230824-092614-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P51264 and previous config saved to /var/cache/conftool/dbconfig/20230824-092603-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51263 and previous config saved to /var/cache/conftool/dbconfig/20230824-092147-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51262 and previous config saved to /var/cache/conftool/dbconfig/20230824-092122-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P51261 and previous config saved to /var/cache/conftool/dbconfig/20230824-092105-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51260 and previous config saved to /var/cache/conftool/dbconfig/20230824-092056-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51259 and previous config saved to /var/cache/conftool/dbconfig/20230824-091741-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P51258 and previous config saved to /var/cache/conftool/dbconfig/20230824-091108-ladsgroup.json
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P51257 and previous config saved to /var/cache/conftool/dbconfig/20230824-091057-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51256 and previous config saved to /var/cache/conftool/dbconfig/20230824-090559-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P51255 and previous config saved to /var/cache/conftool/dbconfig/20230824-090550-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51254 and previous config saved to /var/cache/conftool/dbconfig/20230824-085834-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51253 and previous config saved to /var/cache/conftool/dbconfig/20230824-085810-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P51252 and previous config saved to /var/cache/conftool/dbconfig/20230824-085602-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51251 and previous config saved to /var/cache/conftool/dbconfig/20230824-085551-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P51250 and previous config saved to /var/cache/conftool/dbconfig/20230824-085044-ladsgroup.json
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P51249 and previous config saved to /var/cache/conftool/dbconfig/20230824-084304-ladsgroup.json
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51248 and previous config saved to /var/cache/conftool/dbconfig/20230824-084303-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51247 and previous config saved to /var/cache/conftool/dbconfig/20230824-084055-ladsgroup.json
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51246 and previous config saved to /var/cache/conftool/dbconfig/20230824-083644-ladsgroup.json
  • 08:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51245 and previous config saved to /var/cache/conftool/dbconfig/20230824-083537-ladsgroup.json
  • 08:33 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031) (duration: 07m 45s)
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51244 and previous config saved to /var/cache/conftool/dbconfig/20230824-083248-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51243 and previous config saved to /var/cache/conftool/dbconfig/20230824-083226-ladsgroup.json
  • 08:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 08:30 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs and A:cp
  • 08:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 08:30 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs and A:cp
  • 08:29 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 08:28 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51242 and previous config saved to /var/cache/conftool/dbconfig/20230824-082814-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P51241 and previous config saved to /var/cache/conftool/dbconfig/20230824-082757-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51240 and previous config saved to /var/cache/conftool/dbconfig/20230824-082748-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Change db2179 groups', diff saved to https://phabricator.wikimedia.org/P51239 and previous config saved to /var/cache/conftool/dbconfig/20230824-082742-ladsgroup.json
  • 08:27 taavi@deploy1002: taavi: Continuing with sync
  • 08:27 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:25 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031)
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 08:19 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 08:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P51238 and previous config saved to /var/cache/conftool/dbconfig/20230824-081720-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2140 T344883', diff saved to https://phabricator.wikimedia.org/P51237 and previous config saved to /var/cache/conftool/dbconfig/20230824-081654-ladsgroup.json
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2179 to s4 primary T344883', diff saved to https://phabricator.wikimedia.org/P51236 and previous config saved to /var/cache/conftool/dbconfig/20230824-081442-ladsgroup.json
  • 08:14 Amir1: Starting s4 codfw failover from db2140 to db2179 - T344883
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51235 and previous config saved to /var/cache/conftool/dbconfig/20230824-081229-ladsgroup.json
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P51234 and previous config saved to /var/cache/conftool/dbconfig/20230824-080534-ladsgroup.json
  • 08:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51233 and previous config saved to /var/cache/conftool/dbconfig/20230824-080522-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51232 and previous config saved to /var/cache/conftool/dbconfig/20230824-080316-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P51231 and previous config saved to /var/cache/conftool/dbconfig/20230824-080214-ladsgroup.json
  • 08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51230 and previous config saved to /var/cache/conftool/dbconfig/20230824-075906-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 07:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51229 and previous config saved to /var/cache/conftool/dbconfig/20230824-075842-ladsgroup.json
  • 07:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51228 and previous config saved to /var/cache/conftool/dbconfig/20230824-075722-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51227 and previous config saved to /var/cache/conftool/dbconfig/20230824-075529-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51226 and previous config saved to /var/cache/conftool/dbconfig/20230824-075505-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51225 and previous config saved to /var/cache/conftool/dbconfig/20230824-075028-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P51224 and previous config saved to /var/cache/conftool/dbconfig/20230824-074810-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51223 and previous config saved to /var/cache/conftool/dbconfig/20230824-074708-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P51222 and previous config saved to /var/cache/conftool/dbconfig/20230824-074336-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 07:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51221 and previous config saved to /var/cache/conftool/dbconfig/20230824-074216-ladsgroup.json
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 07:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P51220 and previous config saved to /var/cache/conftool/dbconfig/20230824-073959-ladsgroup.json
  • 07:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:39 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:38 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51219 and previous config saved to /var/cache/conftool/dbconfig/20230824-073355-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P51218 and previous config saved to /var/cache/conftool/dbconfig/20230824-073304-ladsgroup.json
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 07:30 apergos: UTC morning backport and config deployment window complete
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P51217 and previous config saved to /var/cache/conftool/dbconfig/20230824-072829-ladsgroup.json
  • 07:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P51216 and previous config saved to /var/cache/conftool/dbconfig/20230824-072453-ladsgroup.json
  • 07:23 ariel@deploy1002: Finished scap: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816) (duration: 10m 01s)
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Deool es2025', diff saved to https://phabricator.wikimedia.org/P51215 and previous config saved to /var/cache/conftool/dbconfig/20230824-072301-ladsgroup.json
  • 07:22 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:22 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P51214 and previous config saved to /var/cache/conftool/dbconfig/20230824-071849-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51213 and previous config saved to /var/cache/conftool/dbconfig/20230824-071757-ladsgroup.json
  • 07:17 ariel@deploy1002: zoranzoki21 and ariel: Continuing with sync
  • 07:14 ariel@deploy1002: zoranzoki21 and ariel: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51212 and previous config saved to /var/cache/conftool/dbconfig/20230824-071323-ladsgroup.json
  • 07:13 ariel@deploy1002: Started scap: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816)
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51211 and previous config saved to /var/cache/conftool/dbconfig/20230824-071204-ladsgroup.json
  • 07:09 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51210 and previous config saved to /var/cache/conftool/dbconfig/20230824-070946-ladsgroup.json
  • 07:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51209 and previous config saved to /var/cache/conftool/dbconfig/20230824-070723-ladsgroup.json
  • 07:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51208 and previous config saved to /var/cache/conftool/dbconfig/20230824-070710-ladsgroup.json
  • 07:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51207 and previous config saved to /var/cache/conftool/dbconfig/20230824-070702-ladsgroup.json
  • 07:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51206 and previous config saved to /var/cache/conftool/dbconfig/20230824-070646-ladsgroup.json
  • 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Pool es2025', diff saved to https://phabricator.wikimedia.org/P51205 and previous config saved to /var/cache/conftool/dbconfig/20230824-070417-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P51204 and previous config saved to /var/cache/conftool/dbconfig/20230824-070343-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51203 and previous config saved to /var/cache/conftool/dbconfig/20230824-070332-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51202 and previous config saved to /var/cache/conftool/dbconfig/20230824-070307-ladsgroup.json
  • 07:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 07:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P51201 and previous config saved to /var/cache/conftool/dbconfig/20230824-065658-ladsgroup.json
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P51200 and previous config saved to /var/cache/conftool/dbconfig/20230824-065155-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P51199 and previous config saved to /var/cache/conftool/dbconfig/20230824-065140-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51198 and previous config saved to /var/cache/conftool/dbconfig/20230824-064830-ladsgroup.json
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P51197 and previous config saved to /var/cache/conftool/dbconfig/20230824-064801-ladsgroup.json
  • 06:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51196 and previous config saved to /var/cache/conftool/dbconfig/20230824-064205-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P51195 and previous config saved to /var/cache/conftool/dbconfig/20230824-064152-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51194 and previous config saved to /var/cache/conftool/dbconfig/20230824-064044-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51193 and previous config saved to /var/cache/conftool/dbconfig/20230824-064030-ladsgroup.json
  • 06:40 Amir1: killed mwscript updateSpecialPages.php metawiki --override --only=Mostlinked blocking db depool
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P51192 and previous config saved to /var/cache/conftool/dbconfig/20230824-063649-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P51191 and previous config saved to /var/cache/conftool/dbconfig/20230824-063633-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P51190 and previous config saved to /var/cache/conftool/dbconfig/20230824-063255-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2179 with weight 0 T344883', diff saved to https://phabricator.wikimedia.org/P51189 and previous config saved to /var/cache/conftool/dbconfig/20230824-063240-ladsgroup.json
  • 06:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51188 and previous config saved to /var/cache/conftool/dbconfig/20230824-062824-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51187 and previous config saved to /var/cache/conftool/dbconfig/20230824-062802-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51186 and previous config saved to /var/cache/conftool/dbconfig/20230824-062645-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P51185 and previous config saved to /var/cache/conftool/dbconfig/20230824-062523-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51184 and previous config saved to /var/cache/conftool/dbconfig/20230824-062143-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51183 and previous config saved to /var/cache/conftool/dbconfig/20230824-062127-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51182 and previous config saved to /var/cache/conftool/dbconfig/20230824-061813-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51181 and previous config saved to /var/cache/conftool/dbconfig/20230824-061748-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51180 and previous config saved to /var/cache/conftool/dbconfig/20230824-061413-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51179 and previous config saved to /var/cache/conftool/dbconfig/20230824-061348-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P51178 and previous config saved to /var/cache/conftool/dbconfig/20230824-061256-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P51177 and previous config saved to /var/cache/conftool/dbconfig/20230824-061017-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51176 and previous config saved to /var/cache/conftool/dbconfig/20230824-060924-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T344881', diff saved to https://phabricator.wikimedia.org/P51175 and previous config saved to /var/cache/conftool/dbconfig/20230824-060647-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T344881', diff saved to https://phabricator.wikimedia.org/P51174 and previous config saved to /var/cache/conftool/dbconfig/20230824-060245-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T344881', diff saved to https://phabricator.wikimedia.org/P51173 and previous config saved to /var/cache/conftool/dbconfig/20230824-060157-ladsgroup.json
  • 06:01 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T344881
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51172 and previous config saved to /var/cache/conftool/dbconfig/20230824-055846-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P51171 and previous config saved to /var/cache/conftool/dbconfig/20230824-055842-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P51170 and previous config saved to /var/cache/conftool/dbconfig/20230824-055750-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51169 and previous config saved to /var/cache/conftool/dbconfig/20230824-055511-ladsgroup.json
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51168 and previous config saved to /var/cache/conftool/dbconfig/20230824-054726-ladsgroup.json
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51167 and previous config saved to /var/cache/conftool/dbconfig/20230824-054656-ladsgroup.json
  • 05:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P51166 and previous config saved to /var/cache/conftool/dbconfig/20230824-054335-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51165 and previous config saved to /var/cache/conftool/dbconfig/20230824-054244-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51164 and previous config saved to /var/cache/conftool/dbconfig/20230824-054044-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51163 and previous config saved to /var/cache/conftool/dbconfig/20230824-054033-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51162 and previous config saved to /var/cache/conftool/dbconfig/20230824-054023-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51161 and previous config saved to /var/cache/conftool/dbconfig/20230824-054005-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P51160 and previous config saved to /var/cache/conftool/dbconfig/20230824-053150-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51159 and previous config saved to /var/cache/conftool/dbconfig/20230824-052829-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P51158 and previous config saved to /var/cache/conftool/dbconfig/20230824-052517-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P51157 and previous config saved to /var/cache/conftool/dbconfig/20230824-052459-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51156 and previous config saved to /var/cache/conftool/dbconfig/20230824-052208-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51155 and previous config saved to /var/cache/conftool/dbconfig/20230824-052138-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T344881', diff saved to https://phabricator.wikimedia.org/P51154 and previous config saved to /var/cache/conftool/dbconfig/20230824-051951-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344881
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344881
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P51153 and previous config saved to /var/cache/conftool/dbconfig/20230824-051644-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51152 and previous config saved to /var/cache/conftool/dbconfig/20230824-051259-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P51151 and previous config saved to /var/cache/conftool/dbconfig/20230824-051010-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P51150 and previous config saved to /var/cache/conftool/dbconfig/20230824-050953-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51149 and previous config saved to /var/cache/conftool/dbconfig/20230824-050632-ladsgroup.json
  • 05:01 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old columns of extlinks in enwiki (T342683) (duration: 08m 16s)
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51148 and previous config saved to /var/cache/conftool/dbconfig/20230824-050137-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P51147 and previous config saved to /var/cache/conftool/dbconfig/20230824-045753-ladsgroup.json
  • 04:56 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 04:55 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old columns of extlinks in enwiki (T342683) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51146 and previous config saved to /var/cache/conftool/dbconfig/20230824-045504-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51145 and previous config saved to /var/cache/conftool/dbconfig/20230824-045447-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51144 and previous config saved to /var/cache/conftool/dbconfig/20230824-045352-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old columns of extlinks in enwiki (T342683)
  • 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51143 and previous config saved to /var/cache/conftool/dbconfig/20230824-045326-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51142 and previous config saved to /var/cache/conftool/dbconfig/20230824-045236-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51141 and previous config saved to /var/cache/conftool/dbconfig/20230824-045215-ladsgroup.json
  • 04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51140 and previous config saved to /var/cache/conftool/dbconfig/20230824-045125-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P51139 and previous config saved to /var/cache/conftool/dbconfig/20230824-044247-ladsgroup.json
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51138 and previous config saved to /var/cache/conftool/dbconfig/20230824-043820-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P51137 and previous config saved to /var/cache/conftool/dbconfig/20230824-043709-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51136 and previous config saved to /var/cache/conftool/dbconfig/20230824-043619-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51135 and previous config saved to /var/cache/conftool/dbconfig/20230824-043334-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51134 and previous config saved to /var/cache/conftool/dbconfig/20230824-042740-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51133 and previous config saved to /var/cache/conftool/dbconfig/20230824-042314-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P51132 and previous config saved to /var/cache/conftool/dbconfig/20230824-042202-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P51131 and previous config saved to /var/cache/conftool/dbconfig/20230824-041827-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51130 and previous config saved to /var/cache/conftool/dbconfig/20230824-041759-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51129 and previous config saved to /var/cache/conftool/dbconfig/20230824-041537-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51128 and previous config saved to /var/cache/conftool/dbconfig/20230824-041421-ladsgroup.json
  • 04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51127 and previous config saved to /var/cache/conftool/dbconfig/20230824-040808-ladsgroup.json
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51126 and previous config saved to /var/cache/conftool/dbconfig/20230824-040656-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P51125 and previous config saved to /var/cache/conftool/dbconfig/20230824-040321-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P51124 and previous config saved to /var/cache/conftool/dbconfig/20230824-040253-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51123 and previous config saved to /var/cache/conftool/dbconfig/20230824-040139-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51122 and previous config saved to /var/cache/conftool/dbconfig/20230824-034815-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P51121 and previous config saved to /var/cache/conftool/dbconfig/20230824-034747-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51120 and previous config saved to /var/cache/conftool/dbconfig/20230824-034056-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 03:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51119 and previous config saved to /var/cache/conftool/dbconfig/20230824-033240-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51118 and previous config saved to /var/cache/conftool/dbconfig/20230824-032633-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51117 and previous config saved to /var/cache/conftool/dbconfig/20230824-032545-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51116 and previous config saved to /var/cache/conftool/dbconfig/20230824-032508-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51115 and previous config saved to /var/cache/conftool/dbconfig/20230824-032443-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51114 and previous config saved to /var/cache/conftool/dbconfig/20230824-030937-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51113 and previous config saved to /var/cache/conftool/dbconfig/20230824-025431-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 02:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51112 and previous config saved to /var/cache/conftool/dbconfig/20230824-023924-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1178.eqiad.wmnet with reason: Host needs maint
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1178.eqiad.wmnet with reason: Host needs maint
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51111 and previous config saved to /var/cache/conftool/dbconfig/20230824-023407-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:24 eileen: civicrm upgraded from 9afa91fb to 6a2cdf10
  • 00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2040.codfw.wmnet with OS bullseye
  • 00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2023-08-23

  • 23:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2040.codfw.wmnet with reason: host reimage
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2040.codfw.wmnet with reason: host reimage
  • 23:47 eileen: civicrm upgraded from bfedbcb9 to 9afa91fb
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2041.codfw.wmnet with OS bullseye
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2042.codfw.wmnet with OS bullseye
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2040.codfw.wmnet with OS bullseye
  • 23:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2040']
  • 23:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2041.codfw.wmnet with reason: host reimage
  • 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2041.codfw.wmnet with reason: host reimage
  • 23:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2042.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2042.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2040']
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2043.codfw.wmnet with OS bullseye
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2041.codfw.wmnet with OS bullseye
  • 22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2041']
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2044.codfw.wmnet with OS bullseye
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2043.codfw.wmnet with reason: host reimage
  • 22:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2045.codfw.wmnet with OS bullseye
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2042.codfw.wmnet with OS bullseye
  • 22:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2043.codfw.wmnet with reason: host reimage
  • 22:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2041']
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2042']
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2044.codfw.wmnet with reason: host reimage
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2042']
  • 22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2045.codfw.wmnet with reason: host reimage
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2046.codfw.wmnet with OS bullseye
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2044.codfw.wmnet with reason: host reimage
  • 22:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2045.codfw.wmnet with reason: host reimage
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2043.codfw.wmnet with OS bullseye
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2043']
  • 22:20 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2046.codfw.wmnet with reason: host reimage
  • 22:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2043']
  • 22:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2046.codfw.wmnet with reason: host reimage
  • 22:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2044.codfw.wmnet with OS bullseye
  • 22:08 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw and A:cp
  • 22:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2048.codfw.wmnet with OS bullseye
  • 22:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2045.codfw.wmnet with OS bullseye
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw and A:cp
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2044']
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2045']
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
  • 21:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2044']
  • 21:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2044']
  • 21:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2044']
  • 21:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2045']
  • 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2046.codfw.wmnet with OS bullseye
  • 21:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
  • 21:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2047.codfw.wmnet with OS bullseye
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2046']
  • 21:40 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
  • 21:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
  • 21:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
  • 21:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
  • 21:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
  • 21:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
  • 21:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2047.codfw.wmnet with reason: host reimage
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2047.codfw.wmnet with reason: host reimage
  • 21:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:15 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
  • 21:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
  • 21:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
  • 21:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export/downtime test
  • 21:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export/downtime test
  • 21:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
  • 21:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2046']
  • 21:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2049.codfw.wmnet with OS bullseye
  • 21:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2050.codfw.wmnet with OS bullseye
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2047.codfw.wmnet with OS bullseye
  • 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2047']
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
  • 20:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
  • 20:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2051.codfw.wmnet with OS bullseye
  • 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 20:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
  • 20:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2048.codfw.wmnet with OS bullseye
  • 20:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2047']
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2049.codfw.wmnet with reason: host reimage
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2048']
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2050.codfw.wmnet with reason: host reimage
  • 20:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
  • 20:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
  • 20:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2049.codfw.wmnet with reason: host reimage
  • 20:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2050.codfw.wmnet with reason: host reimage
  • 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2051.codfw.wmnet with reason: host reimage
  • 20:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2051.codfw.wmnet with reason: host reimage
  • 20:33 hmonroy@deploy1002: Finished scap: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797) (duration: 07m 09s)
  • 20:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2048']
  • 20:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2052.codfw.wmnet with OS bullseye
  • 20:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
  • 20:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
  • 20:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:28 hmonroy@deploy1002: dreamyjazz and hmonroy: Continuing with sync
  • 20:28 hmonroy@deploy1002: dreamyjazz and hmonroy: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:26 hmonroy@deploy1002: Started scap: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797)
  • 20:25 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
  • 20:23 hmonroy@deploy1002: Finished scap: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754) (duration: 10m 24s)
  • 20:18 hmonroy@deploy1002: hmonroy: Continuing with sync
  • 20:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
  • 20:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2049.codfw.wmnet with OS bullseye
  • 20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2050.codfw.wmnet with OS bullseye
  • 20:14 hmonroy@deploy1002: hmonroy: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2052.codfw.wmnet with reason: host reimage
  • 20:13 hmonroy@deploy1002: Started scap: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754)
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2052.codfw.wmnet with reason: host reimage
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2051.codfw.wmnet with OS bullseye
  • 20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
  • 20:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
  • 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
  • 20:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
  • 19:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2052.codfw.wmnet with OS bullseye
  • 19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2052']
  • 19:43 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
  • 19:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
  • 19:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
  • 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt DNS for kubernetes2051 - pt1979@cumin2002"
  • 19:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2052']
  • 19:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt DNS for kubernetes2051 - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2053.codfw.wmnet with OS bullseye
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2003.codfw.wmnet
  • 19:20 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:14 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2003.codfw.wmnet
  • 19:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
  • 19:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage
  • 19:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
  • 18:57 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cassandra-dev2001.codfw.wmnet
  • 18:56 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@33de526]: (no justification provided) (duration: 00m 20s)
  • 18:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@33de526]: (no justification provided)
  • 18:45 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2001.codfw.wmnet
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye
  • 18:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2053.codfw.wmnet with OS bullseye
  • 18:19 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.23 refs T343725 (duration: 06m 01s)
  • 18:19 herron: re-enabled icinga meta-monitoring on wikitech-static
  • 18:17 denisse: alert hosts maintenance finished
  • 18:13 denisse: making alert1001 the primary alert host
  • 18:09 denisse: updating DNS to point to alert1001
  • 18:03 denisse: failing over from alert2001 to alert1001
  • 17:51 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert1001.wikimedia.org
  • 17:51 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 17:47 denisse: make alert2001 the active host
  • 17:31 denisse: failing over alert1001 to alert2001
  • 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw and A:cp
  • 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw and A:cp
  • 17:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye
  • 17:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-upload_eqiad and A:cp
  • 17:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_eqiad and A:cp
  • 17:22 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp
  • 17:22 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002"
  • 17:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002"
  • 17:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 17:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 17:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2053']
  • 17:07 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org
  • 17:07 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org
  • 17:06 denisse: reboot alert2001 for a kernel upgrade
  • 17:05 herron: set icinga downtime on wikitech-static
  • 17:03 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 17:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2053']
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2053 - pt1979@cumin2002"
  • 16:37 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:35 bblack: cp3067-81 - rolling restart of varnish frontends (one at a time, 30 minute sleep between, will run for ~7.5h), for experimental cache memory settings from https://gerrit.wikimedia.org/r/c/operations/puppet/+/951949
  • 16:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:24 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:17 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 16:17 effie: depool maps/karothertian codfw
  • 16:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:09 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad and A:cp
  • 16:08 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:07 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad and A:cp
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:06 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:05 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:57 bblack: cp3066 - varnish-frontend-restart for new memory params experiment
  • 15:55 effie: pooled codfw kartotherian/maps
  • 15:54 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:44 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 15:44 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 15:40 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:40 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:39 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 15:39 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 15:38 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:37 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:37 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:35 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bookworm
  • 15:34 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 15:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams sandbox - ayounsi@cumin1001"
  • 15:33 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 15:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 15:31 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 15:31 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams sandbox - ayounsi@cumin1001"
  • 15:30 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 15:29 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 15:29 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:21 brennen@deploy1002: Finished deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885) (duration: 00m 38s)
  • 15:21 brennen@deploy1002: Started deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885)
  • 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 15:12 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 15:11 brennen@deploy1002: Finished deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885) (duration: 00m 34s)
  • 15:10 brennen@deploy1002: Started deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885)
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4004.wikimedia.org
  • 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Switch Phabricator to Phorge
  • 15:02 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Switch Phabricator to Phorge
  • 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
  • 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 14:59 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:59 akosiaris: pool kartotherian in codfw for testing T344324
  • 14:58 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4004.wikimedia.org
  • 14:58 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:57 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1117.eqiad.wmnet with OS bullseye
  • 14:57 akosiaris: deploy codfw tegola-vector-tiles with high CPU limits to rule out a hunch. T344324
  • 14:56 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
  • 14:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 14:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
  • 14:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
  • 14:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 14:34 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:34 akosiaris: depool again kartotherian in codfw for testing T344324
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4052.*,cp5032.*} and A:cp
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 14:32 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 14:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:28 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4052.*,cp5032.*} and A:cp
  • 14:26 vgutierrez: update to HAProxy 2.7.10 in cp4052 and cp5032 - T344047
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 14:23 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:23 akosiaris: pool kartotherian in codfw for testing T344324
  • 14:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 14:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 14:16 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 14:06 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:06 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) (duration: 13m 10s)
  • 14:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2053 - pt1979@cumin2002"
  • 14:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 14:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 14:00 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
  • 13:56 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 13:54 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787)
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testvm2004.codfw.wmnet with OS bookworm
  • 13:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 13:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 13:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) (duration: 21m 12s)
  • 13:33 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
  • 13:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 13:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:17 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787)
  • 13:17 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [pawiki] Enable the SandboxLink extension (T344815) (duration: 12m 06s)
  • 13:16 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:11 lucaswerkmeister-wmde@deploy1002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp
  • 13:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp
  • 13:06 lucaswerkmeister-wmde@deploy1002: zoranzoki21 and lucaswerkmeister-wmde: Backport for [pawiki] Enable the SandboxLink extension (T344815) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [pawiki] Enable the SandboxLink extension (T344815)
  • 13:03 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 13:03 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 13:01 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 13:01 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 12:58 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 12:58 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 12:56 jelto: registry* - upgrade jwt-authorizer package on all 4 hosts to version 1.2.0-1 - T337474
  • 12:49 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp
  • 12:49 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp
  • 12:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 12:48 jelto: update jwt-authorizer package to v1.2.0 - T337474
  • 12:48 jelto: update jwt-authorizer package to v1.2.0
  • 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 12:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-eqiad
  • 12:34 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 12:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-eqiad
  • 12:26 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:26 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51102 and previous config saved to /var/cache/conftool/dbconfig/20230823-122440-ladsgroup.json
  • 12:19 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp
  • 12:19 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp
  • 12:17 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-codfw
  • 12:12 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 12:11 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:11 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51101 and previous config saved to /var/cache/conftool/dbconfig/20230823-120933-ladsgroup.json
  • 12:03 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-codfw
  • 12:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 12:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51100 and previous config saved to /var/cache/conftool/dbconfig/20230823-115427-ladsgroup.json
  • 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 11:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_ulsfo and A:cp
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51099 and previous config saved to /var/cache/conftool/dbconfig/20230823-113921-ladsgroup.json
  • 11:37 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51098 and previous config saved to /var/cache/conftool/dbconfig/20230823-113310-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51097 and previous config saved to /var/cache/conftool/dbconfig/20230823-113244-ladsgroup.json
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1002.eqiad.wmnet
  • 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 11:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1002.eqiad.wmnet
  • 11:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas2001.wikimedia.org
  • 11:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:24 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas2001.wikimedia.org on all recursors
  • 11:21 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache atlas2001.wikimedia.org on all recursors
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:19 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51096 and previous config saved to /var/cache/conftool/dbconfig/20230823-111737-ladsgroup.json
  • 11:17 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:17 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host atlas2001.wikimedia.org
  • 11:15 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51095 and previous config saved to /var/cache/conftool/dbconfig/20230823-111500-ladsgroup.json
  • 11:02 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51094 and previous config saved to /var/cache/conftool/dbconfig/20230823-110231-ladsgroup.json
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and not P{cp2042.*} and A:cp
  • 11:00 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2041.*} and not P{cp2039.*} and A:cp
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P51093 and previous config saved to /var/cache/conftool/dbconfig/20230823-105954-ladsgroup.json
  • 10:54 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 10:54 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51092 and previous config saved to /var/cache/conftool/dbconfig/20230823-104725-ladsgroup.json
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and not P{cp2042.*} and A:cp
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2041.*} and not P{cp2039.*} and A:cp
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P51091 and previous config saved to /var/cache/conftool/dbconfig/20230823-104445-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51090 and previous config saved to /var/cache/conftool/dbconfig/20230823-104308-ladsgroup.json
  • 10:40 vgutierrez: rolling upgrade to HAProxy 2.6.15 - T344047
  • 10:37 vgutierrez: repool cp2039
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51089 and previous config saved to /var/cache/conftool/dbconfig/20230823-102939-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51088 and previous config saved to /var/cache/conftool/dbconfig/20230823-102801-ladsgroup.json
  • 10:14 vgutierrez: depool cp2039 to run some HAProxy experiments
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51087 and previous config saved to /var/cache/conftool/dbconfig/20230823-101255-ladsgroup.json
  • 10:09 fabfur: temporary depool/repool cp4040 for haproxy service restart
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51086 and previous config saved to /var/cache/conftool/dbconfig/20230823-100340-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51085 and previous config saved to /var/cache/conftool/dbconfig/20230823-095749-ladsgroup.json
  • 09:57 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51084 and previous config saved to /var/cache/conftool/dbconfig/20230823-095040-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51083 and previous config saved to /var/cache/conftool/dbconfig/20230823-094916-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51082 and previous config saved to /var/cache/conftool/dbconfig/20230823-094851-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P51081 and previous config saved to /var/cache/conftool/dbconfig/20230823-094834-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51079 and previous config saved to /var/cache/conftool/dbconfig/20230823-094727-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51078 and previous config saved to /var/cache/conftool/dbconfig/20230823-094706-ladsgroup.json
  • 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1010.eqiad.wmnet
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51075 and previous config saved to /var/cache/conftool/dbconfig/20230823-093345-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P51074 and previous config saved to /var/cache/conftool/dbconfig/20230823-093327-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P51073 and previous config saved to /var/cache/conftool/dbconfig/20230823-093200-ladsgroup.json
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1010.eqiad.wmnet
  • 09:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1009.eqiad.wmnet
  • 09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1009.eqiad.wmnet
  • 09:24 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:21 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51072 and previous config saved to /var/cache/conftool/dbconfig/20230823-091838-ladsgroup.json
  • 09:18 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51071 and previous config saved to /var/cache/conftool/dbconfig/20230823-091821-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P51070 and previous config saved to /var/cache/conftool/dbconfig/20230823-091653-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51069 and previous config saved to /var/cache/conftool/dbconfig/20230823-091242-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51068 and previous config saved to /var/cache/conftool/dbconfig/20230823-091221-ladsgroup.json
  • 09:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2041.codfw.wmnet} and A:cp
  • 09:05 vgutierrez: update to HAProxy 2.6.15 in cp2041 (text) - T344047
  • 09:05 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2041.codfw.wmnet} and A:cp
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51067 and previous config saved to /var/cache/conftool/dbconfig/20230823-090332-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51066 and previous config saved to /var/cache/conftool/dbconfig/20230823-090147-ladsgroup.json
  • 08:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2042.codfw.wmnet} and A:cp
  • 08:59 vgutierrez: update to HAProxy 2.6.15 in cp2042 (upload) - T344047
  • 08:58 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2042.codfw.wmnet} and A:cp
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P51065 and previous config saved to /var/cache/conftool/dbconfig/20230823-085715-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51064 and previous config saved to /var/cache/conftool/dbconfig/20230823-085706-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51063 and previous config saved to /var/cache/conftool/dbconfig/20230823-085640-ladsgroup.json
  • 08:47 fabfur: run puppet agent on lvs5004 to clear alert
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51062 and previous config saved to /var/cache/conftool/dbconfig/20230823-084711-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51061 and previous config saved to /var/cache/conftool/dbconfig/20230823-084646-ladsgroup.json
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P51060 and previous config saved to /var/cache/conftool/dbconfig/20230823-084203-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51059 and previous config saved to /var/cache/conftool/dbconfig/20230823-084134-ladsgroup.json
  • 08:35 vgutierrez: fetch HAProxy 2.6.15 on thirdparty/haproxy26 for bullseye (apt.wm.o) - T344047
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P51058 and previous config saved to /var/cache/conftool/dbconfig/20230823-083140-ladsgroup.json
  • 08:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1011.eqiad.wmnet
  • 08:29 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:28 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51057 and previous config saved to /var/cache/conftool/dbconfig/20230823-082657-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51056 and previous config saved to /var/cache/conftool/dbconfig/20230823-082628-ladsgroup.json
  • 08:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1011.eqiad.wmnet
  • 08:21 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51055 and previous config saved to /var/cache/conftool/dbconfig/20230823-082116-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51054 and previous config saved to /var/cache/conftool/dbconfig/20230823-082047-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P51053 and previous config saved to /var/cache/conftool/dbconfig/20230823-081633-ladsgroup.json
  • 08:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51052 and previous config saved to /var/cache/conftool/dbconfig/20230823-081122-ladsgroup.json
  • 08:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P51051 and previous config saved to /var/cache/conftool/dbconfig/20230823-080541-ladsgroup.json
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51050 and previous config saved to /var/cache/conftool/dbconfig/20230823-080500-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 08:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51049 and previous config saved to /var/cache/conftool/dbconfig/20230823-080435-ladsgroup.json
  • 08:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 08:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51048 and previous config saved to /var/cache/conftool/dbconfig/20230823-080127-ladsgroup.json
  • 08:00 fabfur: running puppet agent on lvs5006
  • 08:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 07:56 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_ulsfo and A:cp
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P51047 and previous config saved to /var/cache/conftool/dbconfig/20230823-075035-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51046 and previous config saved to /var/cache/conftool/dbconfig/20230823-074928-ladsgroup.json
  • 07:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
  • 07:36 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51045 and previous config saved to /var/cache/conftool/dbconfig/20230823-073529-ladsgroup.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51044 and previous config saved to /var/cache/conftool/dbconfig/20230823-073422-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51043 and previous config saved to /var/cache/conftool/dbconfig/20230823-073001-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51042 and previous config saved to /var/cache/conftool/dbconfig/20230823-072940-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51041 and previous config saved to /var/cache/conftool/dbconfig/20230823-071953-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51040 and previous config saved to /var/cache/conftool/dbconfig/20230823-071916-ladsgroup.json
  • 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P51039 and previous config saved to /var/cache/conftool/dbconfig/20230823-071433-ladsgroup.json
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51038 and previous config saved to /var/cache/conftool/dbconfig/20230823-071249-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51037 and previous config saved to /var/cache/conftool/dbconfig/20230823-071220-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P51036 and previous config saved to /var/cache/conftool/dbconfig/20230823-065927-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51035 and previous config saved to /var/cache/conftool/dbconfig/20230823-065714-ladsgroup.json
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51034 and previous config saved to /var/cache/conftool/dbconfig/20230823-064421-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51033 and previous config saved to /var/cache/conftool/dbconfig/20230823-064207-ladsgroup.json
  • 06:40 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1003.wikimedia.org
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51032 and previous config saved to /var/cache/conftool/dbconfig/20230823-064019-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51031 and previous config saved to /var/cache/conftool/dbconfig/20230823-063754-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P51030 and previous config saved to /var/cache/conftool/dbconfig/20230823-063733-ladsgroup.json
  • 06:33 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1003.wikimedia.org
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51029 and previous config saved to /var/cache/conftool/dbconfig/20230823-062701-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P51028 and previous config saved to /var/cache/conftool/dbconfig/20230823-062513-ladsgroup.json
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P51027 and previous config saved to /var/cache/conftool/dbconfig/20230823-062227-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T344589)', diff saved to https://phabricator.wikimedia.org/P51026 and previous config saved to /var/cache/conftool/dbconfig/20230823-062136-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51025 and previous config saved to /var/cache/conftool/dbconfig/20230823-062112-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51024 and previous config saved to /var/cache/conftool/dbconfig/20230823-062038-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51023 and previous config saved to /var/cache/conftool/dbconfig/20230823-062013-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P51022 and previous config saved to /var/cache/conftool/dbconfig/20230823-061007-ladsgroup.json
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P51021 and previous config saved to /var/cache/conftool/dbconfig/20230823-060721-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51020 and previous config saved to /var/cache/conftool/dbconfig/20230823-060606-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51019 and previous config saved to /var/cache/conftool/dbconfig/20230823-060506-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51018 and previous config saved to /var/cache/conftool/dbconfig/20230823-055500-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P51017 and previous config saved to /var/cache/conftool/dbconfig/20230823-055215-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51016 and previous config saved to /var/cache/conftool/dbconfig/20230823-055059-ladsgroup.json
  • 05:50 zabe@deploy1002: Backport cancelled.
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51015 and previous config saved to /var/cache/conftool/dbconfig/20230823-055000-ladsgroup.json
  • 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51014 and previous config saved to /var/cache/conftool/dbconfig/20230823-054144-ladsgroup.json
  • 05:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51013 and previous config saved to /var/cache/conftool/dbconfig/20230823-054124-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51012 and previous config saved to /var/cache/conftool/dbconfig/20230823-053553-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51011 and previous config saved to /var/cache/conftool/dbconfig/20230823-053454-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51010 and previous config saved to /var/cache/conftool/dbconfig/20230823-052939-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51009 and previous config saved to /var/cache/conftool/dbconfig/20230823-052915-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51008 and previous config saved to /var/cache/conftool/dbconfig/20230823-052834-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 05:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P51007 and previous config saved to /var/cache/conftool/dbconfig/20230823-052809-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P51006 and previous config saved to /var/cache/conftool/dbconfig/20230823-052637-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P51005 and previous config saved to /var/cache/conftool/dbconfig/20230823-052618-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51004 and previous config saved to /var/cache/conftool/dbconfig/20230823-051409-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51003 and previous config saved to /var/cache/conftool/dbconfig/20230823-051312-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51002 and previous config saved to /var/cache/conftool/dbconfig/20230823-051303-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P51001 and previous config saved to /var/cache/conftool/dbconfig/20230823-051251-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P51000 and previous config saved to /var/cache/conftool/dbconfig/20230823-051131-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50999 and previous config saved to /var/cache/conftool/dbconfig/20230823-051112-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50998 and previous config saved to /var/cache/conftool/dbconfig/20230823-045902-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50997 and previous config saved to /var/cache/conftool/dbconfig/20230823-045757-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50996 and previous config saved to /var/cache/conftool/dbconfig/20230823-045744-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P50995 and previous config saved to /var/cache/conftool/dbconfig/20230823-045625-ladsgroup.json
  • 04:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50994 and previous config saved to /var/cache/conftool/dbconfig/20230823-045606-ladsgroup.json
  • 04:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50993 and previous config saved to /var/cache/conftool/dbconfig/20230823-045038-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P50992 and previous config saved to /var/cache/conftool/dbconfig/20230823-044741-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50991 and previous config saved to /var/cache/conftool/dbconfig/20230823-044717-ladsgroup.json
  • 04:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P50990 and previous config saved to /var/cache/conftool/dbconfig/20230823-044356-ladsgroup.json
  • 04:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P50989 and previous config saved to /var/cache/conftool/dbconfig/20230823-044251-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50988 and previous config saved to /var/cache/conftool/dbconfig/20230823-044238-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P50987 and previous config saved to /var/cache/conftool/dbconfig/20230823-043741-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50986 and previous config saved to /var/cache/conftool/dbconfig/20230823-043716-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P50985 and previous config saved to /var/cache/conftool/dbconfig/20230823-043625-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50984 and previous config saved to /var/cache/conftool/dbconfig/20230823-043600-ladsgroup.json
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50983 and previous config saved to /var/cache/conftool/dbconfig/20230823-043216-ladsgroup.json
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P50982 and previous config saved to /var/cache/conftool/dbconfig/20230823-043210-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50981 and previous config saved to /var/cache/conftool/dbconfig/20230823-042732-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50980 and previous config saved to /var/cache/conftool/dbconfig/20230823-042210-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50979 and previous config saved to /var/cache/conftool/dbconfig/20230823-042054-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50978 and previous config saved to /var/cache/conftool/dbconfig/20230823-041712-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P50977 and previous config saved to /var/cache/conftool/dbconfig/20230823-041704-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50976 and previous config saved to /var/cache/conftool/dbconfig/20230823-040704-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50975 and previous config saved to /var/cache/conftool/dbconfig/20230823-040548-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50974 and previous config saved to /var/cache/conftool/dbconfig/20230823-040207-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50973 and previous config saved to /var/cache/conftool/dbconfig/20230823-040158-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50972 and previous config saved to /var/cache/conftool/dbconfig/20230823-035707-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50971 and previous config saved to /var/cache/conftool/dbconfig/20230823-035157-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50970 and previous config saved to /var/cache/conftool/dbconfig/20230823-035042-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50969 and previous config saved to /var/cache/conftool/dbconfig/20230823-034656-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50968 and previous config saved to /var/cache/conftool/dbconfig/20230823-034643-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50967 and previous config saved to /var/cache/conftool/dbconfig/20230823-034549-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50966 and previous config saved to /var/cache/conftool/dbconfig/20230823-034519-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 00:35 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Disable legacy SSL port — T339299 - eevans@cumin1001

2023-08-22

  • 23:49 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 23:45 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 22:59 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 21:21 eileen: config revision changed from 1ea8201f to c2f91f49
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[09-12].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 20:48 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin and A:cp
  • 20:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[09-12].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 20:29 inflatador: bking@cumin1001 enable/run puppet on hosts after rollback T343856
  • 20:27 urbanecm@deploy1002: Finished scap: Backport for Declare v1 of the page_content_change stream. (T307959) (duration: 11m 19s)
  • 20:21 urbanecm@deploy1002: urbanecm and gmodena: Continuing with sync
  • 20:17 urbanecm@deploy1002: urbanecm and gmodena: Backport for Declare v1 of the page_content_change stream. (T307959) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:16 urbanecm@deploy1002: Started scap: Backport for Declare v1 of the page_content_change stream. (T307959)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for clienthints: Collect Client Hints data on all wikis (T341110) (duration: 12m 29s)
  • 20:09 urbanecm@deploy1002: dreamyjazz and urbanecm: Continuing with sync
  • 20:04 urbanecm@deploy1002: dreamyjazz and urbanecm: Backport for clienthints: Collect Client Hints data on all wikis (T341110) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:02 urbanecm@deploy1002: Started scap: Backport for clienthints: Collect Client Hints data on all wikis (T341110)
  • 19:42 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 19:33 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3004.wikimedia.org
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3004.wikimedia.org with OS bullseye
  • 19:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 19:17 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 19:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 19:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 19:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 19:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 19:03 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 19:02 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 18:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bullseye
  • 18:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:55 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3004.wikimedia.org on all recursors
  • 18:54 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3004.wikimedia.org on all recursors
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:53 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:51 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:51 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3004.wikimedia.org
  • 18:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 18:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3004.wikimedia.org
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:48 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:46 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh3004.wikimedia.org
  • 18:41 sukhe: decommissioning doh3004 as it was added in the same ganeti cluster as 3003
  • 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3004.wikimedia.org
  • 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3004.wikimedia.org with OS bullseye
  • 18:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 18:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.23 refs T343725
  • 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 18:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 18:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 18:17 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 18:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1004.eqiad.wmnet
  • 18:06 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 18:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bullseye
  • 18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:01 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3004.wikimedia.org on all recursors
  • 18:01 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3004.wikimedia.org on all recursors
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 17:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 17:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 17:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:57 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3004.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3003.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3003.wikimedia.org with OS bullseye
  • 17:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs200[5-8].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 17:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 17:30 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs200[5-8].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50965 and previous config saved to /var/cache/conftool/dbconfig/20230822-172748-ladsgroup.json
  • 17:26 joal@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 02m 01s)
  • 17:24 joal@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 17:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bullseye
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:18 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 17:17 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:14 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3003.wikimedia.org
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs200[2-4].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50964 and previous config saved to /var/cache/conftool/dbconfig/20230822-171242-ladsgroup.json
  • 17:10 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 17:07 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh3003.wikimedia.org
  • 17:01 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 16:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs200[2-4].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 16:57 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2004.codfw.wmnet
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50963 and previous config saved to /var/cache/conftool/dbconfig/20230822-165736-ladsgroup.json
  • 16:51 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50962 and previous config saved to /var/cache/conftool/dbconfig/20230822-164229-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50961 and previous config saved to /var/cache/conftool/dbconfig/20230822-163609-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50960 and previous config saved to /var/cache/conftool/dbconfig/20230822-163544-ladsgroup.json
  • 16:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 16:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2023.codfw.wmnet with OS bullseye
  • 16:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: re-add wikidough ips - sukhe@cumin2002"
  • 16:25 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: re-add wikidough ips - sukhe@cumin2002"
  • 16:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50959 and previous config saved to /var/cache/conftool/dbconfig/20230822-162038-ladsgroup.json
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3003.wikimedia.org
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 16:14 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:13 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:12 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 16:11 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:11 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2023.codfw.wmnet with reason: host reimage
  • 16:09 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:09 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 16:08 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clear wikidough ips - sukhe@cumin2002"
  • 16:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clear wikidough ips - sukhe@cumin2002"
  • 16:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2023.codfw.wmnet with reason: host reimage
  • 16:05 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50958 and previous config saved to /var/cache/conftool/dbconfig/20230822-160532-ladsgroup.json
  • 16:05 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3003.wikimedia.org
  • 16:05 sukhe@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
  • 15:58 sukhe: sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 8 --disk 15 --network public --os bullseye --cluster esams01 --group BY27 -t T344355 doh3003
  • 15:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:57 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50957 and previous config saved to /var/cache/conftool/dbconfig/20230822-155025-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50956 and previous config saved to /var/cache/conftool/dbconfig/20230822-154712-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50955 and previous config saved to /var/cache/conftool/dbconfig/20230822-154621-ladsgroup.json
  • 15:40 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin and A:cp
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P50954 and previous config saved to /var/cache/conftool/dbconfig/20230822-153206-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50953 and previous config saved to /var/cache/conftool/dbconfig/20230822-153115-ladsgroup.json
  • 15:29 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[12,15,18,21].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P50952 and previous config saved to /var/cache/conftool/dbconfig/20230822-151700-ladsgroup.json
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50951 and previous config saved to /var/cache/conftool/dbconfig/20230822-151608-ladsgroup.json
  • 15:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 15:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[12,15,18,21].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:07 moritzm: installing hdf5 security updates
  • 15:07 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[11,14,17,20].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:07 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2056.codfw.wmnet with OS bullseye
  • 15:06 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 15:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 15:04 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2055.codfw.wmnet with OS bullseye
  • 15:03 kevinbazira: stat1008: Remove `aswiki` from the published datasets repo `/srv/published/datasets/one-off/research-mwaddlink` (T344319)
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50949 and previous config saved to /var/cache/conftool/dbconfig/20230822-150153-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50948 and previous config saved to /var/cache/conftool/dbconfig/20230822-150102-ladsgroup.json
  • 15:00 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 14:59 kevinbazira: tools.stashbot stat1008: Remove `aswiki` from `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T344319)
  • 14:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50947 and previous config saved to /var/cache/conftool/dbconfig/20230822-145544-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50946 and previous config saved to /var/cache/conftool/dbconfig/20230822-145442-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50945 and previous config saved to /var/cache/conftool/dbconfig/20230822-145419-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50944 and previous config saved to /var/cache/conftool/dbconfig/20230822-145418-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50942 and previous config saved to /var/cache/conftool/dbconfig/20230822-145353-ladsgroup.json
  • 14:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 14:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[11,14,17,20].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 14:49 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2056.codfw.wmnet with reason: host reimage
  • 14:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2023.codfw.wmnet with OS bullseye
  • 14:46 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2055.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2056.codfw.wmnet with reason: host reimage
  • 14:44 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs101[6,9].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:43 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2055.codfw.wmnet with reason: host reimage
  • 14:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
  • 14:40 taavi@deploy1002: Finished scap: Backport for wmf-config: update new esams IP ranges (T329219) (duration: 09m 50s)
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50940 and previous config saved to /var/cache/conftool/dbconfig/20230822-143912-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50939 and previous config saved to /var/cache/conftool/dbconfig/20230822-143847-ladsgroup.json
  • 14:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 14:37 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs101[6,9].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1058.eqiad.wmnet with reason: host reimage
  • 14:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1057.eqiad.wmnet with reason: host reimage
  • 14:32 taavi@deploy1002: taavi and sukhe: Continuing with sync
  • 14:32 taavi@deploy1002: taavi and sukhe: Backport for wmf-config: update new esams IP ranges (T329219) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1058.eqiad.wmnet with reason: host reimage
  • 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1057.eqiad.wmnet with reason: host reimage
  • 14:30 taavi@deploy1002: Started scap: Backport for wmf-config: update new esams IP ranges (T329219)
  • 14:27 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 00m 04s)
  • 14:27 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 14:25 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2056.codfw.wmnet with OS bullseye
  • 14:24 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50938 and previous config saved to /var/cache/conftool/dbconfig/20230822-142405-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50937 and previous config saved to /var/cache/conftool/dbconfig/20230822-142341-ladsgroup.json
  • 14:22 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2055.codfw.wmnet with OS bullseye
  • 14:20 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 03m 15s)
  • 14:20 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 14:17 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 14:16 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (thin): Regular analytics weekly train THIN [analytics/refinery@d62f281] (duration: 00m 04s)
  • 14:16 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (thin): Regular analytics weekly train THIN [analytics/refinery@d62f281]
  • 14:16 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281]: Regular analytics weekly train [analytics/refinery@d62f281] (duration: 05m 39s)
  • 14:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "refreshing kubernetes205[56] kubernetes105[78] status T343996 T343993 - hnowlan@cumin1001"
  • 14:14 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "refreshing kubernetes205[56] kubernetes105[78] status T343996 T343993 - hnowlan@cumin1001"
  • 14:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281]: Regular analytics weekly train [analytics/refinery@d62f281]
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50936 and previous config saved to /var/cache/conftool/dbconfig/20230822-140859-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50935 and previous config saved to /var/cache/conftool/dbconfig/20230822-140835-ladsgroup.json
  • 14:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 14:07 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50934 and previous config saved to /var/cache/conftool/dbconfig/20230822-140417-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50933 and previous config saved to /var/cache/conftool/dbconfig/20230822-140236-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50932 and previous config saved to /var/cache/conftool/dbconfig/20230822-140213-ladsgroup.json
  • 14:01 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 14:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
  • 13:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:55 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:52 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
  • 13:49 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50930 and previous config saved to /var/cache/conftool/dbconfig/20230822-134911-ladsgroup.json
  • 13:48 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:48 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
  • 13:48 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50929 and previous config saved to /var/cache/conftool/dbconfig/20230822-134703-ladsgroup.json
  • 13:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 13:43 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:40 urbanecm@deploy1002: Finished scap: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679) (duration: 19m 44s)
  • 13:35 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50928 and previous config saved to /var/cache/conftool/dbconfig/20230822-133405-ladsgroup.json
  • 13:33 urbanecm@deploy1002: urbanecm and dreamyjazz and anzx: Continuing with sync
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50927 and previous config saved to /var/cache/conftool/dbconfig/20230822-133157-ladsgroup.json
  • 13:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:24 klausman: Draining ml-serve2008 for kubelet partition resize
  • 13:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1003.eqiad.wmnet
  • 13:21 urbanecm@deploy1002: urbanecm and dreamyjazz and anzx: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet,
  • 13:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host miscweb1003.eqiad.wmnet
  • 13:20 sukhe: [done] finished repooling esams
  • 13:20 urbanecm@deploy1002: Started scap: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679)
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158) (duration: 15m 43s)
  • 13:19 sukhe: repool esams
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50926 and previous config saved to /var/cache/conftool/dbconfig/20230822-131859-ladsgroup.json
  • 13:17 klausman: Draining ml-serve2007 for kubelet partition resize
  • 13:17 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:17 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50925 and previous config saved to /var/cache/conftool/dbconfig/20230822-131651-ladsgroup.json
  • 13:16 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:13 urbanecm@deploy1002: urbanecm and matmarex: Continuing with sync
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50924 and previous config saved to /var/cache/conftool/dbconfig/20230822-131250-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50923 and previous config saved to /var/cache/conftool/dbconfig/20230822-131122-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50922 and previous config saved to /var/cache/conftool/dbconfig/20230822-131057-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50921 and previous config saved to /var/cache/conftool/dbconfig/20230822-130920-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:09 sukhe: [done] authdns-update for old references
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50919 and previous config saved to /var/cache/conftool/dbconfig/20230822-130856-ladsgroup.json
  • 13:07 sukhe: running authdns-update to remove old references to esams
  • 13:05 urbanecm@deploy1002: urbanecm and matmarex: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codf
  • 13:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:03 urbanecm@deploy1002: Started scap: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158)
  • 13:01 urbanecm: stat1008: Remove `krcwiki` and `ganwiki` from `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T344686)
  • 12:56 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2056.codfw.wmnet with OS bullseye
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50918 and previous config saved to /var/cache/conftool/dbconfig/20230822-125550-ladsgroup.json
  • 12:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2055.codfw.wmnet with OS bullseye
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50917 and previous config saved to /var/cache/conftool/dbconfig/20230822-125350-ladsgroup.json
  • 12:46 urbanecm: mwmaint1002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --scoreLessThan=0.6 --verbose | tee growth-T316079-revalidate-0.6.log # T316079
  • 12:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be2001.codfw.wmnet
  • 12:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be1001.eqiad.wmnet
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50916 and previous config saved to /var/cache/conftool/dbconfig/20230822-124044-ladsgroup.json
  • 12:39 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50915 and previous config saved to /var/cache/conftool/dbconfig/20230822-123844-ladsgroup.json
  • 12:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be2001.codfw.wmnet
  • 12:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be1001.eqiad.wmnet
  • 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 12:28 fabfur@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_ulsfo and A:cp
  • 12:26 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1003.eqiad.wmnet
  • 12:25 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50914 and previous config saved to /var/cache/conftool/dbconfig/20230822-122538-ladsgroup.json
  • 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50913 and previous config saved to /var/cache/conftool/dbconfig/20230822-122338-ladsgroup.json
  • 12:23 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host releases1003.eqiad.wmnet
  • 12:22 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2003.codfw.wmnet
  • 12:22 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:22 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 12:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 12:20 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50912 and previous config saved to /var/cache/conftool/dbconfig/20230822-121913-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50911 and previous config saved to /var/cache/conftool/dbconfig/20230822-121832-ladsgroup.json
  • 12:18 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host releases2003.codfw.wmnet
  • 12:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1003.eqiad.wmnet
  • 12:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50910 and previous config saved to /var/cache/conftool/dbconfig/20230822-121714-ladsgroup.json
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:13 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host doc1003.eqiad.wmnet
  • 12:13 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2002.codfw.wmnet
  • 12:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50909 and previous config saved to /var/cache/conftool/dbconfig/20230822-121218-ladsgroup.json
  • 12:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 12:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 12:09 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host doc2002.codfw.wmnet
  • 12:07 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50908 and previous config saved to /var/cache/conftool/dbconfig/20230822-120326-ladsgroup.json
  • 12:03 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
  • 12:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
  • 12:00 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P50907 and previous config saved to /var/cache/conftool/dbconfig/20230822-115712-ladsgroup.json
  • 11:51 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
  • 11:49 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 11:49 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50906 and previous config saved to /var/cache/conftool/dbconfig/20230822-114820-ladsgroup.json
  • 11:47 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P50905 and previous config saved to /var/cache/conftool/dbconfig/20230822-114206-ladsgroup.json
  • 11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2056.codfw.wmnet with OS bullseye
  • 11:37 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:36 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:36 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2055.codfw.wmnet with OS bullseye
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50904 and previous config saved to /var/cache/conftool/dbconfig/20230822-113313-ladsgroup.json
  • 11:32 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 11:29 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 11:28 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50903 and previous config saved to /var/cache/conftool/dbconfig/20230822-112659-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50902 and previous config saved to /var/cache/conftool/dbconfig/20230822-112650-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50901 and previous config saved to /var/cache/conftool/dbconfig/20230822-112625-ladsgroup.json
  • 11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50900 and previous config saved to /var/cache/conftool/dbconfig/20230822-112438-ladsgroup.json
  • 11:16 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet1002.eqiad.wmnet
  • 11:15 XioNoX: delete RPKI ROAs for 91.198.174.0/24 and 2a02:ec80:500::/48 - T344579
  • 11:13 XioNoX: delete RIPE route6 object for 2a02:ec80:500::/48 - T344579
  • 11:12 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host planet1002.eqiad.wmnet
  • 11:12 XioNoX: delete RIPE route object for 91.198.174.0/24 - T344579
  • 11:12 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet2002.codfw.wmnet
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50899 and previous config saved to /var/cache/conftool/dbconfig/20230822-111119-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50898 and previous config saved to /var/cache/conftool/dbconfig/20230822-110932-ladsgroup.json
  • 11:08 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host planet2002.codfw.wmnet
  • 11:04 XioNoX: delete old ams-ix circuits from ams-ix potal - T344579
  • 11:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50897 and previous config saved to /var/cache/conftool/dbconfig/20230822-105613-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50896 and previous config saved to /var/cache/conftool/dbconfig/20230822-105425-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50895 and previous config saved to /var/cache/conftool/dbconfig/20230822-104106-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50894 and previous config saved to /var/cache/conftool/dbconfig/20230822-103919-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50893 and previous config saved to /var/cache/conftool/dbconfig/20230822-103417-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50892 and previous config saved to /var/cache/conftool/dbconfig/20230822-103255-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50891 and previous config saved to /var/cache/conftool/dbconfig/20230822-103237-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50890 and previous config saved to /var/cache/conftool/dbconfig/20230822-103231-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50889 and previous config saved to /var/cache/conftool/dbconfig/20230822-103212-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P50888 and previous config saved to /var/cache/conftool/dbconfig/20230822-101725-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50887 and previous config saved to /var/cache/conftool/dbconfig/20230822-101706-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P50886 and previous config saved to /var/cache/conftool/dbconfig/20230822-100219-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50885 and previous config saved to /var/cache/conftool/dbconfig/20230822-100200-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50884 and previous config saved to /var/cache/conftool/dbconfig/20230822-095848-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50883 and previous config saved to /var/cache/conftool/dbconfig/20230822-095632-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50882 and previous config saved to /var/cache/conftool/dbconfig/20230822-095351-ladsgroup.json
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50881 and previous config saved to /var/cache/conftool/dbconfig/20230822-095205-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50880 and previous config saved to /var/cache/conftool/dbconfig/20230822-094712-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50879 and previous config saved to /var/cache/conftool/dbconfig/20230822-094653-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50878 and previous config saved to /var/cache/conftool/dbconfig/20230822-094555-ladsgroup.json
  • 09:43 effie: depool codfw kartotherian (maps)
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50877 and previous config saved to /var/cache/conftool/dbconfig/20230822-094343-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50876 and previous config saved to /var/cache/conftool/dbconfig/20230822-094126-ladsgroup.json
  • 09:40 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50875 and previous config saved to /var/cache/conftool/dbconfig/20230822-093915-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50874 and previous config saved to /var/cache/conftool/dbconfig/20230822-093850-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P50873 and previous config saved to /var/cache/conftool/dbconfig/20230822-093844-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50872 and previous config saved to /var/cache/conftool/dbconfig/20230822-093659-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50871 and previous config saved to /var/cache/conftool/dbconfig/20230822-093227-root.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50870 and previous config saved to /var/cache/conftool/dbconfig/20230822-093147-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:31 effie: pooling temporarily kartotherian codfw
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50869 and previous config saved to /var/cache/conftool/dbconfig/20230822-093055-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50868 and previous config saved to /var/cache/conftool/dbconfig/20230822-093049-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50867 and previous config saved to /var/cache/conftool/dbconfig/20230822-092838-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50866 and previous config saved to /var/cache/conftool/dbconfig/20230822-092620-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P50865 and previous config saved to /var/cache/conftool/dbconfig/20230822-092344-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P50864 and previous config saved to /var/cache/conftool/dbconfig/20230822-092338-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50863 and previous config saved to /var/cache/conftool/dbconfig/20230822-092153-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50862 and previous config saved to /var/cache/conftool/dbconfig/20230822-091722-root.json
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P50861 and previous config saved to /var/cache/conftool/dbconfig/20230822-091549-ladsgroup.json
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50860 and previous config saved to /var/cache/conftool/dbconfig/20230822-091542-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50859 and previous config saved to /var/cache/conftool/dbconfig/20230822-091334-ladsgroup.json
  • 09:11 claime: Redirecting 2% of global traffic to mw-on-k8s - T341780
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50858 and previous config saved to /var/cache/conftool/dbconfig/20230822-091113-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P50857 and previous config saved to /var/cache/conftool/dbconfig/20230822-090838-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50856 and previous config saved to /var/cache/conftool/dbconfig/20230822-090832-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50855 and previous config saved to /var/cache/conftool/dbconfig/20230822-090646-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50854 and previous config saved to /var/cache/conftool/dbconfig/20230822-090217-root.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50853 and previous config saved to /var/cache/conftool/dbconfig/20230822-090056-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P50852 and previous config saved to /var/cache/conftool/dbconfig/20230822-090042-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50851 and previous config saved to /var/cache/conftool/dbconfig/20230822-090036-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50850 and previous config saved to /var/cache/conftool/dbconfig/20230822-090026-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50849 and previous config saved to /var/cache/conftool/dbconfig/20230822-090016-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50848 and previous config saved to /var/cache/conftool/dbconfig/20230822-085332-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50847 and previous config saved to /var/cache/conftool/dbconfig/20230822-084724-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343718)', diff saved to https://phabricator.wikimedia.org/P50846 and previous config saved to /var/cache/conftool/dbconfig/20230822-084713-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50845 and previous config saved to /var/cache/conftool/dbconfig/20230822-084712-root.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50844 and previous config saved to /var/cache/conftool/dbconfig/20230822-084703-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T344589)', diff saved to https://phabricator.wikimedia.org/P50843 and previous config saved to /var/cache/conftool/dbconfig/20230822-084638-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P50842 and previous config saved to /var/cache/conftool/dbconfig/20230822-084550-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50841 and previous config saved to /var/cache/conftool/dbconfig/20230822-084536-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50840 and previous config saved to /var/cache/conftool/dbconfig/20230822-084510-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50839 and previous config saved to /var/cache/conftool/dbconfig/20230822-084445-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:43 vgutierrez: restart ATS on cp5024 to clean the ATS restart alert - T344674
  • 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:42 fabfur@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_ulsfo and A:cp
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P50838 and previous config saved to /var/cache/conftool/dbconfig/20230822-084104-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50837 and previous config saved to /var/cache/conftool/dbconfig/20230822-083912-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50836 and previous config saved to /var/cache/conftool/dbconfig/20230822-083848-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P50835 and previous config saved to /var/cache/conftool/dbconfig/20230822-083207-ladsgroup.json
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P50834 and previous config saved to /var/cache/conftool/dbconfig/20230822-083132-ladsgroup.json
  • 08:31 urbanecm: mwmaint1002: Stop frwiki instance of T315510 scripts due to a large volume of T343859 errors
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P50833 and previous config saved to /var/cache/conftool/dbconfig/20230822-083044-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50832 and previous config saved to /var/cache/conftool/dbconfig/20230822-083004-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P50831 and previous config saved to /var/cache/conftool/dbconfig/20230822-082559-ladsgroup.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P50830 and previous config saved to /var/cache/conftool/dbconfig/20230822-082342-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P50829 and previous config saved to /var/cache/conftool/dbconfig/20230822-081701-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P50828 and previous config saved to /var/cache/conftool/dbconfig/20230822-081626-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50827 and previous config saved to /var/cache/conftool/dbconfig/20230822-081537-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343718)', diff saved to ht