Server Admin Log/Archive 63

From Wikitech

2023-02-28

  • 23:58 zabe@deploy2002: Started scap: T198673
  • 23:45 ejegg: civicrm upgraded from ffc16d2d to d199694e
  • 23:43 zabe@deploy2002: Synchronized wmf-config/InitialiseSettings.php: T213295 (duration: 06m 56s)
  • 23:24 mutante: miscweb2002 rm -rf /srv/org/wikimedia/design/blog/ - this has moved to /srv/org/wikimedia/design-blog but was not deleted in codfw - bringing both to the same state before switching design.wikimedia.org over T330090
  • 23:20 zabe@deploy2002: Finished scap: Backport for Drop custom testcommonswiki groups (T213295) (duration: 07m 57s)
  • 23:14 zabe@deploy2002: zabe: Backport for Drop custom testcommonswiki groups (T213295) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:12 zabe@deploy2002: Started scap: Backport for Drop custom testcommonswiki groups (T213295)
  • 22:46 zabe@deploy2002: Synchronized dblists/: close testcommonswiki T213295 (duration: 07m 11s)
  • 22:31 zabe@deploy2002: Synchronized dblists/: close testcommonswiki T213295 (duration: 06m 40s)
  • 22:24 brennen@deploy2002: Finished deploy [phabricator/deployment@3f2dd1b]: debug deploy to aphlict2001 (duration: 00m 37s)
  • 22:23 brennen@deploy2002: Started deploy [phabricator/deployment@3f2dd1b]: debug deploy to aphlict2001
  • 22:01 apergos: started rsync from dumpsdata1001 to dumpsdata1004 of /data/otherdumps, running in ariel screen session, no bandwidth cap
  • 22:00 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:42 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:32 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@fc4e023]: Deploying section_image_recommendations DAG to platform_eng Airflow instance (duration: 00m 21s)
  • 21:32 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@fc4e023]: Deploying section_image_recommendations DAG to platform_eng Airflow instance
  • 21:26 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: Enable Growth features by default on testwikis (T330748) (duration: 07m 43s)
  • 21:19 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: Enable Growth features by default on testwikis (T330748)
  • 21:14 samtar@deploy2002: Finished scap: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444) (duration: 10m 36s)
  • 21:05 samtar@deploy2002: esanders and samtar: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:03 samtar@deploy2002: Started scap: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444)
  • 20:54 zabe@deploy2002: Finished scap: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881) (duration: 10m 06s)
  • 20:45 zabe@deploy2002: zabe: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:43 zabe@deploy2002: Started scap: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881)
  • 20:20 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:16 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:14 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:54 dancy@deploy2002: Started scap: testing
  • 19:51 dancy@deploy2002: Installing scap version "latest" for 550 hosts
  • 19:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2073']
  • 19:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 19:21 ryankemper: [WDQS] (Current time) T301167 Re-enabled icinga notifications for `wdqs20[09-12]`
  • 19:21 ryankemper: [WDQS] (The following was ~20 hours ago, forgot to press enter) T301167 Transferred `/srv/wdqs/categories.jnl` from `wdqs2001` (in-service host) to `wdqs20[09-12]` (new hosts being brought into service)
  • 18:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2073']
  • 18:26 ladsgroup@deploy2002: Finished scap: Backport for Revert "mwscript: Switch to use run.php" (duration: 42m 21s)
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1418.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1417.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1416.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1415.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1448.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1447.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1449.eqiad.wmnet
  • 18:21 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1418.eqiad.wmnet
  • 18:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 18:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2072']
  • 18:13 ladsgroup@deploy2002: trainbranchbot and ladsgroup: Backport for Revert "mwscript: Switch to use run.php" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2072']
  • 18:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2071']
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1418.eqiad.wmnet
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1417.eqiad.wmnet
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1416.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1415.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1448.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1447.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1449.eqiad.wmnet
  • 18:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2071']
  • 17:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:44 ladsgroup@deploy2002: Started scap: Backport for Revert "mwscript: Switch to use run.php"
  • 17:38 ladsgroup@deploy2002: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 24m 05s)
  • 17:38 ladsgroup@deploy2002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 17:33 ladsgroup@deploy2002: ladsgroup: Backport for mwscript: Switch to use run.php (T326800) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 17:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2070']
  • 17:14 ladsgroup@deploy2002: Started scap: Backport for mwscript: Switch to use run.php (T326800)
  • 17:11 ladsgroup@deploy2002: Finished scap: Backport for Convert eval script to Maintenance class (duration: 07m 47s)
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 17:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2070']
  • 17:05 ladsgroup@deploy2002: ladsgroup: Backport for Convert eval script to Maintenance class synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:03 ladsgroup@deploy2002: Started scap: Backport for Convert eval script to Maintenance class
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:59 ejegg: payments-wiki upgraded from 871c4e5c to b9ea2130
  • 16:52 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:45 claime: Traffic and Service switchovers to codfw finished - T330651 - T330650
  • 16:38 claime: stale discovery files wiped for netbox - T330651
  • 16:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1006.eqiad.wmnet with reason: host reimage
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1006.eqiad.wmnet with reason: host reimage
  • 16:20 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool netbox in codfw: T330651
  • 16:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 16:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet on all recursors
  • 16:16 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet on all recursors
  • 16:16 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool netbox in codfw: T330651
  • 16:15 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) pool netbox in eqiad: T330651
  • 16:11 Lucas_WMDE: changed data source of https://grafana-rw.wikimedia.org/alerting/grafana/MF0FSjJ4z/view from “eqiad prometheus/k8s” to “thanos” to query both eqiad and codfw after dc switch
  • 16:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet on all recursors
  • 16:10 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet on all recursors
  • 16:10 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool netbox in eqiad: T330651
  • 16:09 claime: Switching netbox back to eqiad - T330651
  • 16:06 hnowlan@deploy2002: Finished deploy [restbase/deploy@5271b8f]: New wikis: gucwp, gurwp, vewikimedia T320899 T326237 T327843 (duration: 15m 38s)
  • 15:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:50 hnowlan@deploy2002: Started deploy [restbase/deploy@5271b8f]: New wikis: gucwp, gurwp, vewikimedia T320899 T326237 T327843
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:47 cgoubert@deploy2002: Synchronized README: check the deployment server after switchover - T330651 (duration: 20m 56s)
  • 15:47 claime: Traffic: eqiad depooled - T330650
  • 15:45 claime: Running authdns-update - T330650
  • 15:45 claime: Traffic: depool eqiad from user traffic - T330650
  • 15:31 moritzm: installing tiff security updates
  • 15:26 claime: Testing scap deployment from deploy2002.codfw.wmnet - T330651
  • 15:25 claime: Removing scap lock on deploy2002.codfw.wmnet
  • 15:23 _joe_: oblivian@deploy2002:~ $ sudo chown imagecatalog:imagecatalog /srv/deployment/imagecatalog/catalog.sqlite
  • 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 28 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 15:20 claime: Disregard running puppet on fleet-wide - T330651
  • 15:18 claime: Running puppet on fleet-wide - T330651
  • 15:16 claime: Running puppet on all deployment servers - T330651
  • 15:10 claime: Running authdns-update for deployment server switch - T330651
  • 15:05 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1005.eqiad.wmnet
  • 15:05 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 15:04 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 15:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T330761
  • 14:56 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:53 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 claime: Switch deployment server to deploy2002.codfw.wmnet - T330651
  • 14:48 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1005.eqiad.wmnet
  • 14:44 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in eqiad: None - None
  • 14:44 oblivian@cumin1001: START - Cookbook sre.discovery.datacenter status all services in eqiad: None - None
  • 14:44 claime: Services switched over to codfw - T329193
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:22 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:21 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover - T330651
  • 14:21 claime: switching services over to codfw - T330651
  • 14:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:08 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:59 marostegui: Create dummy and empty enwiki.text table on db2186:3311 to test check_private_data T326596
  • 13:49 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:32 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 13:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:20 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:11 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:10 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:08 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:05 jnuche@deploy1002: Installation of scap version "latest" completed for 550 hosts
  • 13:05 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:04 jnuche@deploy1002: Installing scap version "latest" for 550 hosts
  • 13:04 claime: Locking scap deployments for service switchover - T330651
  • 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:02 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:00 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:58 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:56 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:56 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:55 root@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 12:54 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:53 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2001.codfw.wmnet with OS bullseye
  • 12:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:47 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:47 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:41 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:40 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2007.codfw.wmnet with OS bullseye
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2004.codfw.wmnet with OS bullseye
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:38 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2006.codfw.wmnet with OS bullseye
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
  • 12:31 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2008.codfw.wmnet with OS bullseye
  • 12:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2003.codfw.wmnet with OS bullseye
  • 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2002.codfw.wmnet with OS bullseye
  • 12:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 12:22 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
  • 12:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
  • 12:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 12:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 12:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
  • 12:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
  • 12:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 12:12 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
  • 12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 12:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 12:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 12:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 11:56 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2006.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2008.codfw.wmnet with OS bullseye
  • 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bullseye
  • 11:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2003.codfw.wmnet with OS bullseye
  • 11:49 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bullseye
  • 11:48 root@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2001.codfw.wmnet with OS bullseye
  • 11:39 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet with OS bullseye
  • 11:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
  • 11:21 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
  • 11:21 marostegui: Install MariaDB 11.0.1 on db1106 T330643
  • 11:11 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl2002.codfw.wmnet with OS bullseye
  • 11:11 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet with OS bullseye
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 10:55 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 10:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir-ulsfo
  • 10:51 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:50 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir-ulsfo
  • 10:49 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 10:49 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 10:38 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl2001.codfw.wmnet with OS bullseye
  • 10:33 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 10:32 root@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade to k8s 1.23
  • 10:32 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 10:29 jnuche@deploy1002: Installation of scap version "latest" completed for 8 hosts
  • 10:29 jnuche@deploy1002: Installing scap version "latest" for 8 hosts
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 23951
  • 10:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 23951
  • 10:21 moritzm: installing apr-util security updates on buster
  • 10:19 jnuche@deploy1002: Installation of scap version "latest" completed for 1 hosts
  • 10:19 jnuche@deploy1002: Installing scap version "latest" for 1 hosts
  • 10:15 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2002.codfw.wmnet with OS bullseye
  • 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2003.codfw.wmnet with OS bullseye
  • 10:13 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 10:10 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
  • 10:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
  • 10:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 10:00 klausman@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=inference,name=codfw
  • 09:55 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2003.codfw.wmnet with OS bullseye
  • 09:55 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2002.codfw.wmnet with OS bullseye
  • 09:54 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 09:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 16 hosts with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on 16 hosts with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 09:46 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php --wiki azwikimedia --bureaucrat Zabe REDACTED
  • 09:13 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 09:13 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.25 refs T325588
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 09:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 08:52 moritzm: restarting r/w slapd to pick up openssl security updates
  • 08:43 vgutierrez: enable system hardening for haproxy in ulsfo - T323944
  • 08:11 moritzm: installing openssl security updates on buster
  • 07:53 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:51 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:04 marostegui: Stop mysql on db2094 T326596
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1820
  • 07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1820
  • 07:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56099
  • 07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56099
  • 07:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4621
  • 06:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4621
  • 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18187
  • 06:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18187
  • 06:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23951
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23951
  • 06:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1267
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1267
  • 06:51 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 07m 46s)
  • 06:45 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:43 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 06:41 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653) (duration: 07m 54s)
  • 06:35 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 06:33 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653)
  • 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.23 (duration: 02m 18s)
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.25 refs T325588 (duration: 53m 02s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.25 refs T325588

2023-02-27

  • 23:54 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:51 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:19 zabe@deploy1002: Finished scap: Backport for Add `guc` and `gur` to InterwikiSortOrders (duration: 07m 41s)
  • 23:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:13 zabe@deploy1002: jhsoby and zabe: Backport for Add `guc` and `gur` to InterwikiSortOrders synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 23:12 zabe@deploy1002: Started scap: Backport for Add `guc` and `gur` to InterwikiSortOrders
  • 23:09 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:01 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:00 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 22:36 mutante: switching https://annual.wikimedia.org from eqiad to codfw T330090
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2073']
  • 22:31 ryankemper: [apifeatureusage] T329957 Restarted `logstash` on `apifeatureusage[1-2]001`
  • 22:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 22:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2072']
  • 22:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:16 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:09 zabe@deploy1002: Finished scap: Backport for Remove vewikimedia from deleted wikis (T320890) (duration: 07m 30s)
  • 22:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2072']
  • 22:04 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:03 zabe@deploy1002: zabe: Backport for Remove vewikimedia from deleted wikis (T320890) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2071']
  • 22:01 zabe@deploy1002: Started scap: Backport for Remove vewikimedia from deleted wikis (T320890)
  • 21:58 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2071']
  • 21:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2070']
  • 21:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 21:25 TheresNoTime: close UTC late backport window
  • 21:21 samtar@deploy1002: Finished scap: Backport for [extwiki] Change wordmark and tagline (T330588) (duration: 08m 14s)
  • 21:15 samtar@deploy1002: samtar and superpes: Backport for [extwiki] Change wordmark and tagline (T330588) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:13 samtar@deploy1002: Started scap: Backport for [extwiki] Change wordmark and tagline (T330588)
  • 21:12 samtar@deploy1002: Finished scap: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470) (duration: 08m 42s)
  • 21:05 samtar@deploy1002: superpes and samtar: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:04 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php --wiki vewikimedia --bureaucrat Zabe REDACTED
  • 21:03 samtar@deploy1002: Started scap: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470)
  • 21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:57 zabe@deploy1002: Finished scap: install Translate on vewikimedia and update interwiki cache T320890 (duration: 07m 26s)
  • 20:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2072.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:50 zabe@deploy1002: Started scap: install Translate on vewikimedia and update interwiki cache T320890
  • 20:50 zabe@deploy1002: sync-world aborted: install Translate on vewikimedia and update interwiki cache (duration: 00m 06s)
  • 20:50 zabe@deploy1002: Started scap: install Translate on vewikimedia and update interwiki cache
  • 20:49 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:48 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:48 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:48 zabe@deploy1002: Finished scap: create vewikimedia T320890 (duration: 07m 29s)
  • 20:42 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:42 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:40 zabe@deploy1002: Started scap: create vewikimedia T320890
  • 20:39 zabe: create Wikimedia Venezuela wiki # T320890
  • 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2072.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 20:11 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:07 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:06 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:55 urandom: power cycling restbase1026
  • 19:42 zabe@deploy1002: Finished scap: create gurwiki T327813 (duration: 07m 19s)
  • 19:38 samtar@deploy1002: Backport cancelled.
  • 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2070.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:35 zabe@deploy1002: Started scap: create gurwiki T327813
  • 19:34 zabe: create Wikipedia Farefare (Gurene) # T327813
  • 19:18 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2070.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-be nodes - pt1979@cumin2002"
  • 18:37 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 18:37 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 18:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-be nodes - pt1979@cumin2002"
  • 18:29 zabe: start running "foreachwikiindblist s3.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
  • 18:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:24 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 18:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c8dc6d5]: cirrus namespaces: Work arround missing domain_name in upstream (duration: 02m 29s)
  • 18:07 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c8dc6d5]: cirrus namespaces: Work arround missing domain_name in upstream
  • 18:03 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:03 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 18:01 zabe@deploy1002: Synchronized wmf-config/interwiki.php: (no justification provided) (duration: 06m 54s)
  • 17:38 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 17:38 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:38 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:37 zabe@deploy1002: Finished scap: create gucwiki T321880 (duration: 11m 05s)
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:35 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:33 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:28 zabe@deploy1002: zabe: create gucwiki T321880 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 17:26 zabe@deploy1002: Started scap: create gucwiki T321880
  • 17:22 zabe: create Wikipedia Wayuu # T321880
  • 17:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1004.eqiad.wmnet with reason: host reimage
  • 17:09 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1004.eqiad.wmnet with reason: host reimage
  • 16:54 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:54 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2022.codfw.wmnet with OS bullseye
  • 16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 16:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 16:25 jgleeson: payments-wiki updated from c13b8d26 to 871c4e5c
  • 16:25 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2022.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2022.codfw.wmnet with reason: host reimage
  • 16:08 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 16:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 16:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 16:02 hashar@deploy1002: Finished deploy [integration/docroot@cd7c263]: build: Pin PHPUnit to 9.5.28 like in other repos (duration: 00m 12s)
  • 16:02 hashar@deploy1002: Started deploy [integration/docroot@cd7c263]: build: Pin PHPUnit to 9.5.28 like in other repos
  • 15:58 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1004']
  • 15:56 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 15:52 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1004']
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 15:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 15:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:48 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1004']
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 15:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
  • 15:41 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1004']
  • 15:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 15:41 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 15:40 urbanecm@deploy1002: Finished scap: Backport for cswiki: Grant changetags only to bots/sysops (T330383) (duration: 07m 39s)
  • 15:36 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 15:35 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 15:34 urbanecm@deploy1002: urbanecm: Backport for cswiki: Grant changetags only to bots/sysops (T330383) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44888 and previous config saved to /var/cache/conftool/dbconfig/20230227-153324-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44887 and previous config saved to /var/cache/conftool/dbconfig/20230227-153318-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44886 and previous config saved to /var/cache/conftool/dbconfig/20230227-153313-root.json
  • 15:32 urbanecm@deploy1002: Started scap: Backport for cswiki: Grant changetags only to bots/sysops (T330383)
  • 15:24 cgoubert@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 cgoubert@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1003.eqiad.wmnet with reason: host reimage
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44884 and previous config saved to /var/cache/conftool/dbconfig/20230227-151836-root.json
  • 15:18 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1003.eqiad.wmnet with reason: host reimage
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44883 and previous config saved to /var/cache/conftool/dbconfig/20230227-151826-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44882 and previous config saved to /var/cache/conftool/dbconfig/20230227-151819-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44881 and previous config saved to /var/cache/conftool/dbconfig/20230227-151813-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44880 and previous config saved to /var/cache/conftool/dbconfig/20230227-151808-root.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44878 and previous config saved to /var/cache/conftool/dbconfig/20230227-151434-root.json
  • 15:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:11 inflatador: bking@deploy1002 applying https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/891577 on dse-k8s-cluster via helmfile
  • 15:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 15:08 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 15:06 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44877 and previous config saved to /var/cache/conftool/dbconfig/20230227-150535-root.json
  • 15:04 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44876 and previous config saved to /var/cache/conftool/dbconfig/20230227-150331-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44875 and previous config saved to /var/cache/conftool/dbconfig/20230227-150322-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44874 and previous config saved to /var/cache/conftool/dbconfig/20230227-150315-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44873 and previous config saved to /var/cache/conftool/dbconfig/20230227-150309-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44872 and previous config saved to /var/cache/conftool/dbconfig/20230227-150304-root.json
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44871 and previous config saved to /var/cache/conftool/dbconfig/20230227-145929-root.json
  • 14:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 14:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:52 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44870 and previous config saved to /var/cache/conftool/dbconfig/20230227-145030-root.json
  • 14:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44869 and previous config saved to /var/cache/conftool/dbconfig/20230227-144826-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44868 and previous config saved to /var/cache/conftool/dbconfig/20230227-144816-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44867 and previous config saved to /var/cache/conftool/dbconfig/20230227-144810-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44866 and previous config saved to /var/cache/conftool/dbconfig/20230227-144804-root.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44865 and previous config saved to /var/cache/conftool/dbconfig/20230227-144759-root.json
  • 14:45 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44864 and previous config saved to /var/cache/conftool/dbconfig/20230227-144424-root.json
  • 14:35 claime: done live testing sre.switchdc.mediawiki.03-set-db-readonly and sre.switchdc.mediawiki.06-set-db-readwrite back to back - T330302
  • 14:35 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44863 and previous config saved to /var/cache/conftool/dbconfig/20230227-143525-root.json
  • 14:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:34 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:34 claime: live testing sre.switchdc.mediawiki.03-set-db-readonly and sre.switchdc.mediawiki.06-set-db-readwrite back to back - T330302
  • 14:33 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44862 and previous config saved to /var/cache/conftool/dbconfig/20230227-143321-root.json
  • 14:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44861 and previous config saved to /var/cache/conftool/dbconfig/20230227-143311-root.json
  • 14:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44860 and previous config saved to /var/cache/conftool/dbconfig/20230227-143305-root.json
  • 14:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44859 and previous config saved to /var/cache/conftool/dbconfig/20230227-143259-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44858 and previous config saved to /var/cache/conftool/dbconfig/20230227-143254-root.json
  • 14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44857 and previous config saved to /var/cache/conftool/dbconfig/20230227-142919-root.json
  • 14:28 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44855 and previous config saved to /var/cache/conftool/dbconfig/20230227-142020-root.json
  • 14:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44854 and previous config saved to /var/cache/conftool/dbconfig/20230227-141815-root.json
  • 14:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44853 and previous config saved to /var/cache/conftool/dbconfig/20230227-141806-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44852 and previous config saved to /var/cache/conftool/dbconfig/20230227-141800-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44851 and previous config saved to /var/cache/conftool/dbconfig/20230227-141754-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44850 and previous config saved to /var/cache/conftool/dbconfig/20230227-141749-root.json
  • 14:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:14 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44849 and previous config saved to /var/cache/conftool/dbconfig/20230227-141415-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44848 and previous config saved to /var/cache/conftool/dbconfig/20230227-141130-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44847 and previous config saved to /var/cache/conftool/dbconfig/20230227-141120-root.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44846 and previous config saved to /var/cache/conftool/dbconfig/20230227-140811-root.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44845 and previous config saved to /var/cache/conftool/dbconfig/20230227-140707-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44844 and previous config saved to /var/cache/conftool/dbconfig/20230227-140527-root.json
  • 14:05 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileContext.php: Completely get rid of responsiveimages removal, part III (T326147) (duration: 07m 36s)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44843 and previous config saved to /var/cache/conftool/dbconfig/20230227-140523-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44842 and previous config saved to /var/cache/conftool/dbconfig/20230227-140515-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44841 and previous config saved to /var/cache/conftool/dbconfig/20230227-140310-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44840 and previous config saved to /var/cache/conftool/dbconfig/20230227-140301-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44839 and previous config saved to /var/cache/conftool/dbconfig/20230227-140255-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44838 and previous config saved to /var/cache/conftool/dbconfig/20230227-140249-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44837 and previous config saved to /var/cache/conftool/dbconfig/20230227-140244-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44836 and previous config saved to /var/cache/conftool/dbconfig/20230227-135910-root.json
  • 13:59 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part III (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178 db2146 db2180 T330653', diff saved to https://phabricator.wikimedia.org/P44835 and previous config saved to /var/cache/conftool/dbconfig/20230227-135856-root.json
  • 13:58 moritzm: restarting apache on mw canaries to pick up apr-util updates
  • 13:56 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileFrontendHooks.php: Completely get rid of responsiveimages removal, part II (T326147) (duration: 07m 24s)
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44834 and previous config saved to /var/cache/conftool/dbconfig/20230227-135625-root.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44833 and previous config saved to /var/cache/conftool/dbconfig/20230227-135615-root.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44832 and previous config saved to /var/cache/conftool/dbconfig/20230227-135306-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44831 and previous config saved to /var/cache/conftool/dbconfig/20230227-135225-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44830 and previous config saved to /var/cache/conftool/dbconfig/20230227-135202-root.json
  • 13:50 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part II (T326147) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44828 and previous config saved to /var/cache/conftool/dbconfig/20230227-135023-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44827 and previous config saved to /var/cache/conftool/dbconfig/20230227-135018-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44826 and previous config saved to /var/cache/conftool/dbconfig/20230227-135011-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44825 and previous config saved to /var/cache/conftool/dbconfig/20230227-135010-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44824 and previous config saved to /var/cache/conftool/dbconfig/20230227-134756-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44823 and previous config saved to /var/cache/conftool/dbconfig/20230227-134753-root.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44821 and previous config saved to /var/cache/conftool/dbconfig/20230227-134405-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44819 and previous config saved to /var/cache/conftool/dbconfig/20230227-134120-root.json
  • 13:41 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/extension.json: Completely get rid of responsiveimages removal, part I (T326147) (duration: 10m 48s)
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44818 and previous config saved to /var/cache/conftool/dbconfig/20230227-134110-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T330653', diff saved to https://phabricator.wikimedia.org/P44817 and previous config saved to /var/cache/conftool/dbconfig/20230227-134018-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44815 and previous config saved to /var/cache/conftool/dbconfig/20230227-133801-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44814 and previous config saved to /var/cache/conftool/dbconfig/20230227-133720-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44813 and previous config saved to /var/cache/conftool/dbconfig/20230227-133657-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44811 and previous config saved to /var/cache/conftool/dbconfig/20230227-133518-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44810 and previous config saved to /var/cache/conftool/dbconfig/20230227-133513-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44809 and previous config saved to /var/cache/conftool/dbconfig/20230227-133506-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44808 and previous config saved to /var/cache/conftool/dbconfig/20230227-133506-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175 T330653', diff saved to https://phabricator.wikimedia.org/P44805 and previous config saved to /var/cache/conftool/dbconfig/20230227-133231-root.json
  • 13:32 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part I (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:30 ladsgroup@deploy1002: sync-file aborted: Completely get rid of responsiveimages removal, part I (T308932) (duration: 44m 38s)
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44800 and previous config saved to /var/cache/conftool/dbconfig/20230227-132615-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44799 and previous config saved to /var/cache/conftool/dbconfig/20230227-132605-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44798 and previous config saved to /var/cache/conftool/dbconfig/20230227-132257-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44797 and previous config saved to /var/cache/conftool/dbconfig/20230227-132215-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44796 and previous config saved to /var/cache/conftool/dbconfig/20230227-132151-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44795 and previous config saved to /var/cache/conftool/dbconfig/20230227-132013-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44794 and previous config saved to /var/cache/conftool/dbconfig/20230227-132008-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44793 and previous config saved to /var/cache/conftool/dbconfig/20230227-132002-root.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44791 and previous config saved to /var/cache/conftool/dbconfig/20230227-131100-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44790 and previous config saved to /var/cache/conftool/dbconfig/20230227-130752-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44789 and previous config saved to /var/cache/conftool/dbconfig/20230227-130711-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44788 and previous config saved to /var/cache/conftool/dbconfig/20230227-130646-root.json
  • 13:05 moritzm: installing openssl security updates on Buster
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44787 and previous config saved to /var/cache/conftool/dbconfig/20230227-130508-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44786 and previous config saved to /var/cache/conftool/dbconfig/20230227-130503-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44785 and previous config saved to /var/cache/conftool/dbconfig/20230227-130457-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44784 and previous config saved to /var/cache/conftool/dbconfig/20230227-125605-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44783 and previous config saved to /var/cache/conftool/dbconfig/20230227-125555-root.json
  • 12:55 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part I (T308932) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44782 and previous config saved to /var/cache/conftool/dbconfig/20230227-125247-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44781 and previous config saved to /var/cache/conftool/dbconfig/20230227-125206-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44780 and previous config saved to /var/cache/conftool/dbconfig/20230227-125141-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44779 and previous config saved to /var/cache/conftool/dbconfig/20230227-125003-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44778 and previous config saved to /var/cache/conftool/dbconfig/20230227-124959-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44777 and previous config saved to /var/cache/conftool/dbconfig/20230227-124952-root.json
  • 12:43 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44776 and previous config saved to /var/cache/conftool/dbconfig/20230227-124100-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44775 and previous config saved to /var/cache/conftool/dbconfig/20230227-124050-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es2022 T330653', diff saved to https://phabricator.wikimedia.org/P44774 and previous config saved to /var/cache/conftool/dbconfig/20230227-123814-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44773 and previous config saved to /var/cache/conftool/dbconfig/20230227-123742-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44772 and previous config saved to /var/cache/conftool/dbconfig/20230227-123701-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P44771 and previous config saved to /var/cache/conftool/dbconfig/20230227-123636-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 T330653', diff saved to https://phabricator.wikimedia.org/P44770 and previous config saved to /var/cache/conftool/dbconfig/20230227-123514-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44769 and previous config saved to /var/cache/conftool/dbconfig/20230227-123459-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44768 and previous config saved to /var/cache/conftool/dbconfig/20230227-123454-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44767 and previous config saved to /var/cache/conftool/dbconfig/20230227-123447-root.json
  • 12:34 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:31 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200 db1111 db1168 db1143 T330653', diff saved to https://phabricator.wikimedia.org/P44766 and previous config saved to /var/cache/conftool/dbconfig/20230227-122804-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44765 and previous config saved to /var/cache/conftool/dbconfig/20230227-122131-root.json
  • 12:21 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P44764 and previous config saved to /var/cache/conftool/dbconfig/20230227-121846-root.json
  • 12:12 moritzm: installing apr-util security updates on buster
  • 12:10 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: host still been configuered - T327970
  • 12:10 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: host still been configuered - T327970
  • 12:04 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P44763 and previous config saved to /var/cache/conftool/dbconfig/20230227-120002-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44762 and previous config saved to /var/cache/conftool/dbconfig/20230227-115947-root.json
  • 11:53 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:49 vgutierrez: set "X-Content-Type-Options: nosniff" on upload.wm.o requests - T309787
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P44761 and previous config saved to /var/cache/conftool/dbconfig/20230227-114442-root.json
  • 11:43 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: Add azwikimedia T317120 (duration: 01m 25s)
  • 11:42 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: Add azwikimedia T317120
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P44760 and previous config saved to /var/cache/conftool/dbconfig/20230227-113130-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44759 and previous config saved to /var/cache/conftool/dbconfig/20230227-112937-root.json
  • 11:29 apergos: rsync public (huge!) xmldatadumps dir from dumpsdata1003 to dumpsdata1004; running from ariel screen session on dumpsdata1003, no bandwidth cap
  • 11:28 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-fe2013.codfw.wmnet with reason: testing redfish T326848
  • 11:28 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-fe2013.codfw.wmnet with reason: testing redfish T326848
  • 11:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:10 apergos: rsync private xmldatadumps dir from dumpsdata1003 to dumpsdata1004; running from ariel screen session on dumpsdata1003, no bandwidth cap
  • 11:08 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:04 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:54 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:43 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:39 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:32 marostegui: Restart eqiad sanitarium hosts T330502
  • 10:31 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 10:26 marostegui: Restart codfw sanitarium hosts T330502
  • 10:23 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:23 dcaro@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
  • 10:19 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:19 claime: live testing cache warmup cookbook
  • 10:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: Running failover to gitlab2002- T329931
  • 10:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: Running failover to gitlab2002- T329931
  • 10:08 marostegui: Enable replication codfw -> eqiad on s4 T330619
  • 09:47 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:46 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:44 marostegui: Enable replication codfw -> eqiad on s1 T330619
  • 09:39 marostegui: Enable replication codfw -> eqiad on s5 T330619
  • 09:36 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:36 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:34 marostegui: Enable replication codfw -> eqiad on s6 T330619
  • 09:32 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:27 marostegui: Enable replication codfw -> eqiad on s7 T330619
  • 09:26 marostegui: Enable replication codfw -> eqiad on s8 T330619
  • 09:20 hashar: Restarting CI Jenkins T330045
  • 09:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:16 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided) (duration: 00m 46s)
  • 09:15 hashar@deploy1002: Started deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided)
  • 09:12 marostegui: Enable replication codfw -> eqiad on s3 T330619
  • 08:56 moritzm: updating mw/codfw to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 08:54 vgutierrez: test haproxy hardening in cp4045 - T323944
  • 08:51 marostegui: Disable GTID on es% x1 and s% on codfw masters T330619
  • 08:51 marostegui: Enable replication codfw -> eqiad on s2 T330619
  • 08:32 marostegui: Enable replication codfw -> eqiad on es4 and es5 T330619
  • 07:57 marostegui: Enable replication codfw -> eqiad on x1 T330619
  • 07:42 marostegui: Enable replication codfw -> eqiad on pcX T330619

2023-02-26

  • 02:07 Amir1: foreachwikiindblist s5 maintenance/migrateExternallinks.php --batch-size=100 --sleep 1 (T326314)

2023-02-25

  • 15:30 apergos: resized lvm and filesystem for /data on dumpsdata1004,5,7; was <100G, now is 38T usable (left some room for growth later)
  • 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade
  • 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade
  • 09:38 elukey: delete knative pods on ml-serve-codfw to clear latency alerts

2023-02-24

  • 23:15 mutante: people2002 - for each user who has a public_html dir that is not empty (for pubdir in $(find . -name public_html -type d -not -empty); ..); rsync it from people1003 with --delete (rsync -avp rsync://people1003.eqiad.wmnet/people-home/${pubdiruser}/public_html/ /home/${pubdiruser}/public_html/); T330091
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2013.codfw.wmnet with OS bullseye
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 21:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2013.codfw.wmnet with OS bullseye
  • 21:26 mutante: people2002 - performing the usual dance when device names changed after editing virtual hardware (s/ens13/ens14 in /etc/network/interfaces ... reboot)
  • 21:19 mutante: rebooting people2002
  • 21:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3002.wikimedia.org with OS bullseye
  • 21:06 mutante: ganeti2021 - adding a virtual 20G disk to people2002 - to temp get some space for backups and syncing T330091
  • 20:59 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 20:59 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 20:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3002.wikimedia.org with reason: host reimage
  • 20:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3002.wikimedia.org with reason: host reimage
  • 20:46 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 20:45 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2014']
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2013']
  • 20:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3002.wikimedia.org with OS bullseye
  • 20:11 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3002.wikimedia.org with OS bullseye
  • 20:06 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 19:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2014']
  • 19:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 19:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2014']
  • 19:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2013']
  • 19:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 19:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2014']
  • 19:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3002.wikimedia.org with OS bullseye
  • 19:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:14 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-fe2013']
  • 19:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:00 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS bullseye
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 18:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:38 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2022.codfw.wmnet with OS bullseye
  • 18:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 18:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS bullseye
  • 18:13 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 17:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:03 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 17:03 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 16:57 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 26s)
  • 16:57 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 16:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS buster
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2020.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new url downloaders - jmm@cumin2002 - T329945"
  • 15:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new url downloaders - jmm@cumin2002 - T329945"
  • 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2020.codfw.wmnet with reason: host reimage
  • 14:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2020.codfw.wmnet with reason: host reimage
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 14:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 14:50 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['wdqs2013']
  • 14:49 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 14:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 14:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 14:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 14:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:23 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 14:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:17 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 12:26 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader2004.wikimedia.org with OS bullseye
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader2004.wikimedia.org with reason: host reimage
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader2004.wikimedia.org with reason: host reimage
  • 11:37 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader2004.wikimedia.org with OS bullseye
  • 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader2003.wikimedia.org with OS bullseye
  • 10:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 10:46 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:46 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:45 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:45 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader2003.wikimedia.org with reason: host reimage
  • 10:44 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:44 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader2003.wikimedia.org with reason: host reimage
  • 10:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:32 moritzm: installing emacs security updates on bullseye
  • 10:32 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:32 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 10:13 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader2003.wikimedia.org with OS bullseye
  • 10:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 10:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 10:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 10:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 09:32 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 09:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 09:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 09:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 09:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 09:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 09:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 09:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 09:08 elukey: rm /var/log/{syslog,messages,user.log}.1 on kubetcd1005 to free up space - T329717
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:40 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host dse-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:10 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host dse-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:06 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 8 hosts with reason: Downtime DSE workers for cluster upgrade
  • 07:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 8 hosts with reason: Downtime DSE workers for cluster upgrade
  • 07:52 elukey: rm /var/log/{syslog,messages,user.log}.1 on kubetcd1006 to free up space - T329717
  • 03:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS bullseye
  • 03:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2020.codfw.wmnet with OS bullseye
  • 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 03:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 03:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2019.codfw.wmnet with OS bullseye
  • 03:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2018.codfw.wmnet with OS bullseye
  • 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 03:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:58 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2019.codfw.wmnet with reason: host reimage
  • 02:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2019.codfw.wmnet with reason: host reimage
  • 02:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2018.codfw.wmnet with reason: host reimage
  • 02:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2018.codfw.wmnet with reason: host reimage
  • 02:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2018.codfw.wmnet with OS bullseye
  • 02:18 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2018.codfw.wmnet with OS bullseye
  • 02:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2019.codfw.wmnet with OS bullseye
  • 02:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2015.codfw.wmnet with OS bullseye
  • 02:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2018.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2017.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2016.codfw.wmnet with OS bullseye
  • 01:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:49 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:48 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 01:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 01:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 01:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 01:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2016.codfw.wmnet with OS bullseye
  • 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2014.codfw.wmnet with OS bullseye
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2013.codfw.wmnet with OS bullseye
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:13 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply JRE updates - bking@cumin1001 - T329957

2023-02-23

  • 23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:25 zabe: mwscript namespaceDupes.php shnwiktionary --fix # T330456
  • 23:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS bullseye
  • 22:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 22:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 22:34 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply JRE updates - bking@cumin1001 - T329957
  • 22:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS bullseye
  • 22:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS bullseye
  • 22:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2013.codfw.wmnet with reason: host reimage
  • 22:11 zabe@deploy1002: Synchronized wmf-config/interwiki.php: gerrit:891395 (duration: 07m 11s)
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2013.codfw.wmnet with reason: host reimage
  • 21:56 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T306015 (duration: 06m 49s)
  • 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 21:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:45 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:44 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:44 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:43 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2013.codfw.wmnet with OS bullseye
  • 21:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:37 zabe@deploy1002: Finished scap: create azwikimedia T306015 (duration: 07m 54s)
  • 21:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply JRE updates - bking@cumin1001 - T329957
  • 21:31 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:29 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki newiki reviewer` for T327114
  • 21:29 zabe@deploy1002: Started scap: create azwikimedia T306015
  • 21:25 zabe: create Azerbaijani Wikimedians User Group wiki # T306015
  • 21:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs2013']
  • 21:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 21:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS bullseye
  • 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2022']
  • 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2022']
  • 20:45 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:25 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:21 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:18 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host relforge1004.eqiad.wmnet
  • 20:10 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host relforge1004.eqiad.wmnet
  • 20:09 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 3 (duration: 03m 13s)
  • 20:08 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:06 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 3
  • 20:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 20:05 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host relforge1003.eqiad.wmnet
  • 19:55 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 2 (duration: 01m 04s)
  • 19:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply JRE updates - bking@cumin1001 - T329957
  • 19:53 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 2
  • 19:51 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001 (duration: 01m 10s)
  • 19:51 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host relforge1003.eqiad.wmnet
  • 19:50 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001
  • 19:50 mutante: aphlict2001 - manually created /etc/phabricator/config.yaml - empty file owned by root:phab-deploy to debug for T330393 T322369
  • 19:46 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 19:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS bullseye
  • 18:53 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:53 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 18:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:49 sukhe: run puppet agent on puppetdb2003
  • 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2022']
  • 18:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2022']
  • 18:38 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:36 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:36 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:36 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:35 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:35 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:34 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:34 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 18:24 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 18:15 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 18:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS bullseye
  • 18:14 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 18:08 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 17:56 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 17:55 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS bullseye
  • 17:46 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:45 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 27s)
  • 17:44 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:44 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 03m 32s)
  • 17:43 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:43 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 17:41 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:36 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 17:36 cdanis@cumin1001: dbctl commit (dc=all): 'so hot right now', diff saved to https://phabricator.wikimedia.org/P44753 and previous config saved to /var/cache/conftool/dbconfig/20230223-173608-cdanis.json
  • 17:31 cdanis@cumin1001: dbctl commit (dc=all): 'db1127 running very hot', diff saved to https://phabricator.wikimedia.org/P44752 and previous config saved to /var/cache/conftool/dbconfig/20230223-173127-cdanis.json
  • 17:31 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 17:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 17:07 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 16:58 papaul: raplacing fpc2 to fpc1 DAC cable complete
  • 16:52 papaul: raplacing fpc2 to fpc1 DAC cable
  • 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:42 hnowlan@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:41 hnowlan: eqiad: roll-restarting swift frontends and thumbor hosts for key rotation
  • 16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:26 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:15 hnowlan@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:14 hnowlan: codfw: roll-restarting swift frontends and thumbor hosts for key rotation
  • 16:13 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:13 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2020']
  • 16:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2020']
  • 16:04 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 16:03 moritzm: installing c-ares security updates
  • 16:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1002
  • 16:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1002
  • 16:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1001
  • 16:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1001
  • 16:03 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 16:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:00 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:57 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 15:51 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 15:45 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:42 klausman@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:38 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:36 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move more ns-related config out of InitialiseSettings, part II (T308932) (duration: 06m 35s)
  • 15:22 klausman@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:15 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 15:15 sukhe: running authdns-update for CR 891569
  • 15:15 ladsgroup@deploy1002: Synchronized wmf-config/core-Namespaces.php: Move more ns-related config out of InitialiseSettings, part I (T308932) (duration: 07m 01s)
  • 14:56 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 14:55 ladsgroup@deploy1002: Finished scap: Backport for [shnwiktionary] Create 8 new namespaces (T330376) (duration: 09m 21s)
  • 14:48 ladsgroup@deploy1002: ladsgroup and superpes: Backport for [shnwiktionary] Create 8 new namespaces (T330376) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:45 ladsgroup@deploy1002: Started scap: Backport for [shnwiktionary] Create 8 new namespaces (T330376)
  • 14:45 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:43 ladsgroup@deploy1002: Finished scap: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279) (duration: 09m 58s)
  • 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:41 volans: installed spicearck 6.2.2 to cumin hosts
  • 14:38 volans: uploaded spicerack_6.2.2 to apt.wikimedia.org bullseye-wikimedia
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2020']
  • 14:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2019']
  • 14:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2018']
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2017']
  • 14:35 ladsgroup@deploy1002: ladsgroup and superpes: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:33 ladsgroup@deploy1002: Started scap: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279)
  • 14:30 ladsgroup@deploy1002: Finished scap: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363) (duration: 11m 10s)
  • 14:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:29 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2020']
  • 14:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2016']
  • 14:28 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2019']
  • 14:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2015']
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2018']
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017']
  • 14:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2014']
  • 14:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2013']
  • 14:21 ladsgroup@deploy1002: stang and ladsgroup: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2016']
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2015']
  • 14:19 ladsgroup@deploy1002: Started scap: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363)
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2014']
  • 14:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 14:13 akosiaris: upgrade istio in wikikube codfw, staging-eqiad, staging-codfw to 1.15.3-2 to re-enable istio metrics
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44749 and previous config saved to /var/cache/conftool/dbconfig/20230223-120449-root.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44747 and previous config saved to /var/cache/conftool/dbconfig/20230223-114944-root.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44746 and previous config saved to /var/cache/conftool/dbconfig/20230223-113440-root.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44744 and previous config saved to /var/cache/conftool/dbconfig/20230223-111935-root.json
  • 10:59 moritzm: updating mw canaries to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 09:51 moritzm: uploaded php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 to component/php74 T330270
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1004.eqiad.wmnet
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:50 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:47 moritzm: uploaded php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 to component/php74 T323358
  • 09:37 elukey: powercycle thumbor1005 - OEM even for DIMM B1 detected in `getsel`, no tty available via mgmt console
  • 09:32 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 09:23 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1004.eqiad.wmnet
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1003.eqiad.wmnet
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:12 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.24 refs T325587
  • 09:09 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 08:45 apergos: UTC morning backport and config training done
  • 08:42 kartik@deploy1002: Finished scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) (duration: 10m 27s)
  • 08:36 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1003.eqiad.wmnet
  • 08:33 kartik@deploy1002: kartik: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:31 kartik@deploy1002: Started scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893)
  • 08:30 kartik@deploy1002: Finished scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) (duration: 13m 08s)
  • 08:19 kartik@deploy1002: kartik: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:17 kartik@deploy1002: Started scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893)
  • 07:49 hashar: operations/mediawiki-config will no run `tox` to verify logos | T329231 | https://gerrit.wikimedia.org/r/c/integration/config/+/891317
  • 02:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS ms-be2070 - pt1979@cumin2002"
  • 02:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS ms-be2070 - pt1979@cumin2002"
  • 02:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2070
  • 02:46 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2070
  • 02:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2018.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2018.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new wdqs nodes - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new wdqs nodes - pt1979@cumin2002"
  • 01:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:47 zabe@deploy1002: Finished scap: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575) (duration: 08m 33s)
  • 00:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:40 zabe@deploy1002: aklapper and zabe: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 00:39 zabe@deploy1002: Started scap: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575)
  • 00:36 zabe@deploy1002: Finished scap: Backport for throttle: Remove expired rules (duration: 08m 36s)
  • 00:29 zabe@deploy1002: zabe: Backport for throttle: Remove expired rules synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 00:27 zabe@deploy1002: Started scap: Backport for throttle: Remove expired rules

2023-02-22

  • 23:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:07 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:56 zabe@deploy1002: Synchronized wmf-config/interwiki.php: T230382 (duration: 07m 06s)
  • 21:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:40 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:39 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:36 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:31 urbanecm@deploy1002: Finished scap: Backport for Build backend for PersonalizedPraise (T322444) (duration: 07m 22s)
  • 21:24 urbanecm@deploy1002: Started scap: Backport for Build backend for PersonalizedPraise (T322444)
  • 21:10 urbanecm@deploy1002: Finished scap: Backport for Growth: Set GEPersonalizedPraiseBackendEnabled to true on pilot wikis (T322444) (duration: 07m 33s)
  • 21:02 urbanecm@deploy1002: Started scap: Backport for Growth: Set GEPersonalizedPraiseBackendEnabled to true on pilot wikis (T322444)
  • 20:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[26-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[26-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:14 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[19-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:08 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:07 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:06 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44743 and previous config saved to /var/cache/conftool/dbconfig/20230222-193422-root.json
  • 19:34 mforns: restarted the following an-launcher1002 timers, which seemed stuck (next run = n/a): gobblin-webrequest.timer, reportupdater-browser.timer, reportupdater-reference-previews.timer, refine_event.timer, refine_eventlogging_legacy.timer
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44742 and previous config saved to /var/cache/conftool/dbconfig/20230222-191918-root.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44741 and previous config saved to /var/cache/conftool/dbconfig/20230222-190413-root.json
  • 18:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[19-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44740 and previous config saved to /var/cache/conftool/dbconfig/20230222-184908-root.json
  • 18:33 mutante: planet* - stopping and restarting all the timers for the various languages, commands from https://phabricator.wikimedia.org/P44739
  • 18:24 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'depool db1112', diff saved to https://phabricator.wikimedia.org/P44738 and previous config saved to /var/cache/conftool/dbconfig/20230222-181046-ladsgroup.json
  • 18:08 jbond: stop all failed timer servies and restart the corrosponding timer unit
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: T329864', diff saved to https://phabricator.wikimedia.org/P44737 and previous config saved to /var/cache/conftool/dbconfig/20230222-175434-root.json
  • 17:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:41 sukhe: force puppet run on stat1007
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: T329864', diff saved to https://phabricator.wikimedia.org/P44736 and previous config saved to /var/cache/conftool/dbconfig/20230222-173929-root.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: T329864', diff saved to https://phabricator.wikimedia.org/P44734 and previous config saved to /var/cache/conftool/dbconfig/20230222-172424-root.json
  • 17:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns4004.wikimedia.org with OS bullseye
  • 17:17 sukhe: running authdns-update for T152882 / CR 890908
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: T329864', diff saved to https://phabricator.wikimedia.org/P44733 and previous config saved to /var/cache/conftool/dbconfig/20230222-170920-root.json
  • 16:23 hashar@deploy1002: Finished deploy [integration/docroot@b32e023]: doc: Add GrowthExperiments to MediaWiki components - T329034 (duration: 00m 07s)
  • 16:23 hashar@deploy1002: Started deploy [integration/docroot@b32e023]: doc: Add GrowthExperiments to MediaWiki components - T329034
  • 16:21 hashar@deploy1002: Finished deploy [integration/docroot@956dd11]: zuul: Link to report_url if available (duration: 00m 15s)
  • 16:21 hashar@deploy1002: Started deploy [integration/docroot@956dd11]: zuul: Link to report_url if available
  • 16:17 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 16:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:58 urbanecm@deploy1002: Finished scap: Backport for [tox] Make running `tox` work (T329231) (duration: 07m 54s)
  • 15:52 urbanecm@deploy1002: urbanecm: Backport for [tox] Make running `tox` work (T329231) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:50 urbanecm@deploy1002: Started scap: Backport for [tox] Make running `tox` work (T329231)
  • 15:47 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:30 moritzm: update mwdebug2002 to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T323358
  • 15:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 14:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bullseye
  • 14:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move all of userrights config out of IS.php to a dedicated file, part III (T308932) (duration: 06m 16s)
  • 14:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@1a041e2] (releasing): (no justification provided) (duration: 00m 49s)
  • 14:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@1a041e2] (releasing): (no justification provided)
  • 14:23 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move all of userrights config out of IS.php to a dedicated file, part II (T308932) (duration: 07m 01s)
  • 14:15 ladsgroup@deploy1002: Synchronized wmf-config/core-Permissions.php: Move all of userrights config out of IS.php to a dedicated file, part I (T308932) (duration: 68m 38s)
  • 14:11 akosiaris: uncordon kubernetes20{17,18,19,21} T330048
  • 14:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 14:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 14:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:08 akosiaris: test network connectivity of kubernetes20{17,18,19,21}
  • 14:06 TheresNoTime: UTC afternoon backport window not done due to in-progress deployment
  • 14:06 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:40 moritzm: rolling restart of FPM on mw canaries
  • 12:32 moritzm: installing openssl security updates on buster
  • 12:22 akosiaris: test
  • 12:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 12:18 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-02-22 12:18:45.829060
  • 12:14 cgoubert@cumin1001: conftool action : set/val=false; selector: name=ReadOnly,scope=codfw
  • 12:13 cgoubert@cumin1001: conftool action : set/val=no; selector: name=ReadOnly,scope=codfw
  • 12:13 cgoubert@cumin1001: conftool action : set/val=False; selector: name=ReadOnly,scope=codfw
  • 12:02 moritzm: installing NSS security updates
  • 11:57 cgoubert@cumin1001: conftool action : set/val=false; selector: name=ReadOnly,scope=codfw
  • 11:49 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1166, errors not fixed', diff saved to https://phabricator.wikimedia.org/P44727 and previous config saved to /var/cache/conftool/dbconfig/20230222-114940-jynus.json
  • 11:45 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1166, seen mw errors', diff saved to https://phabricator.wikimedia.org/P44726 and previous config saved to /var/cache/conftool/dbconfig/20230222-114515-jynus.json
  • 11:26 moritzm: installing git security updates
  • 11:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 11:18 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 11:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 11:16 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 11:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 11:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 11:14 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 11:13 cgoubert@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2023-02-22 11:13:51.466468
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 11:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: Running failover to gitlab1003 - T329930
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 11:13 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: Running failover to gitlab1003 - T329930
  • 11:04 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 11:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 11:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 11:03 cgoubert@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2023-02-22 11:03:19.149671
  • 11:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 11:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 11:02 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 11:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 11:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2019.codfw.wmnet with OS bullseye
  • 10:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2021.codfw.wmnet with OS bullseye
  • 10:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2018.codfw.wmnet with OS bullseye
  • 10:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 10:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 10:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 10:22 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1005.eqiad.wmnet with OS bullseye
  • 10:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 10:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2017.codfw.wmnet with OS bullseye
  • 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 10:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 10:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 10:08 claime: Starting sre.switchdc.mediawiki live test preparation steps
  • 10:07 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 10:05 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2021.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2019.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS bullseye
  • 10:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 09:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.24 refs T325587 (duration: 06m 33s)
  • 09:52 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.24 refs T325587
  • 09:51 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1005.eqiad.wmnet with reason: host reimage
  • 09:48 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1005.eqiad.wmnet with reason: host reimage
  • 09:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2017.codfw.wmnet with OS bullseye
  • 09:30 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.23" - T325587
  • 09:14 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.24 refs T325587 (duration: 06m 38s)
  • 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.24 refs T325587
  • 09:03 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1004.eqiad.wmnet with OS bullseye
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:49 vgutierrez: rolling upgrade to HAProxy 2.6.9 in codfw, eqsin, drmrs, esams and eqiad
  • 08:47 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1004.eqiad.wmnet with reason: host reimage
  • 08:43 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1004.eqiad.wmnet with reason: host reimage
  • 08:36 ryankemper: [WDQS] Repooled `wdqs20[05,07,10]`
  • 08:22 kartik@deploy1002: Finished scap: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941) (duration: 10m 41s)
  • 08:17 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1004.eqiad.wmnet with OS bullseye
  • 08:14 kartik@deploy1002: kartik: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:12 kartik@deploy1002: Started scap: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941)
  • 08:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P44724 and previous config saved to /var/cache/conftool/dbconfig/20230222-080050-jynus.json
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2186.codfw.wmnet with OS bullseye
  • 01:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 01:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye

2023-02-21

  • 23:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44723 and previous config saved to /var/cache/conftool/dbconfig/20230221-235012-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44722 and previous config saved to /var/cache/conftool/dbconfig/20230221-233506-ladsgroup.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44721 and previous config saved to /var/cache/conftool/dbconfig/20230221-232000-ladsgroup.json
  • 23:09 tzatziki: removing 5 files for legal compliance
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44720 and previous config saved to /var/cache/conftool/dbconfig/20230221-230454-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44719 and previous config saved to /var/cache/conftool/dbconfig/20230221-230109-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44718 and previous config saved to /var/cache/conftool/dbconfig/20230221-230048-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44717 and previous config saved to /var/cache/conftool/dbconfig/20230221-224542-ladsgroup.json
  • 22:44 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1003.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:43 tzatziki: removing 15 files for legal compliance
  • 22:37 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1003.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1002.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:31 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1002.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44716 and previous config saved to /var/cache/conftool/dbconfig/20230221-223036-ladsgroup.json
  • 22:30 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1001.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1001.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:22 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2003.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:16 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2003.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:16 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2002.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44715 and previous config saved to /var/cache/conftool/dbconfig/20230221-221529-ladsgroup.json
  • 22:14 catrope@deploy1002: Finished scap: Backport for Add VueTest to extension-list, add config var (T315621) (duration: 37m 07s)
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44714 and previous config saved to /var/cache/conftool/dbconfig/20230221-221042-ladsgroup.json
  • 22:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44713 and previous config saved to /var/cache/conftool/dbconfig/20230221-221021-ladsgroup.json
  • 22:10 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2002.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:09 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:01 catrope@deploy1002: catrope: Backport for Add VueTest to extension-list, add config var (T315621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44712 and previous config saved to /var/cache/conftool/dbconfig/20230221-215515-ladsgroup.json
  • 21:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44711 and previous config saved to /var/cache/conftool/dbconfig/20230221-214009-ladsgroup.json
  • 21:37 catrope@deploy1002: Started scap: Backport for Add VueTest to extension-list, add config var (T315621)
  • 21:35 catrope@deploy1002: Finished scap: Backport for Remove wgLinterSubmitterWhitelist (T329992) (duration: 10m 36s)
  • 21:26 catrope@deploy1002: arlolra and catrope: Backport for Remove wgLinterSubmitterWhitelist (T329992) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44710 and previous config saved to /var/cache/conftool/dbconfig/20230221-212503-ladsgroup.json
  • 21:24 catrope@deploy1002: Started scap: Backport for Remove wgLinterSubmitterWhitelist (T329992)
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44709 and previous config saved to /var/cache/conftool/dbconfig/20230221-212123-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 21:19 catrope@deploy1002: Finished scap: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303) (duration: 10m 28s)
  • 21:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2186.codfw.wmnet with OS bullseye
  • 21:10 catrope@deploy1002: catrope and nray: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:08 catrope@deploy1002: Started scap: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303)
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44708 and previous config saved to /var/cache/conftool/dbconfig/20230221-205822-root.json
  • 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 20:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 20:45 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@5edcd7b]: Test deployment of search airflow dags (duration: 01m 08s)
  • 20:44 ebernhardson@deploy1002: Started deploy [airflow-dags/search@5edcd7b]: Test deployment of search airflow dags
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44707 and previous config saved to /var/cache/conftool/dbconfig/20230221-204317-root.json
  • 20:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44706 and previous config saved to /var/cache/conftool/dbconfig/20230221-202813-root.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44705 and previous config saved to /var/cache/conftool/dbconfig/20230221-201308-root.json
  • 20:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2186']
  • 20:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2186.codfw.wmnet with OS bullseye
  • 18:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:21 dancy@deploy1002: Installing scap version "4.38.0" for 564 hosts
  • 18:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T327991
  • 17:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 17:57 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 17:57 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T327991
  • 17:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:54 sukhe: run authdns-update for Gerrit: 890847. repooling codfw
  • 17:53 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:53 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:51 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 17:50 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye
  • 17:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 17:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2186']
  • 17:42 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 17:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 17:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
  • 17:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
  • 17:26 cwhite: Grafana 9x upgrade in production complete T317887
  • 17:25 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.24 refs T325587
  • 17:09 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1003.eqiad.wmnet with OS bullseye
  • 17:08 elukey@puppetmaster1001: conftool action : set/weight=10; selector: name=kubernetes2023.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/weight=10; selector: name=kubernetes2024.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2024.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2023.codfw.wmnet
  • 17:04 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: T327991 - None
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=codfw
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=codfw
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 16:57 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in codfw: None - None
  • 16:57 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in codfw: None - None
  • 16:51 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1003.eqiad.wmnet with reason: host reimage
  • 16:50 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: T327991 - None
  • 16:48 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1003.eqiad.wmnet with reason: host reimage
  • 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 16:22 papaul: rebooting mgmt switch in rack c3
  • 16:20 papaul: rebooting mgmt switch in rack b3
  • 16:17 papaul: rebooting mgmt switch in rack b1
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T328255)', diff saved to https://phabricator.wikimedia.org/P44704 and previous config saved to /var/cache/conftool/dbconfig/20230221-161552-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:15 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1003.eqiad.wmnet with OS bullseye
  • 16:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:13 papaul: rebooting mgmt switch in rack a7
  • 16:10 papaul: rebooting mgmt switch in rack a5
  • 16:06 moritzm: imported libxml2 2.9.4+dfsg1-7+deb10u5+icu67+wmf1 to component/icu67 for buster-wikimedia T329491
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: T329664
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:36 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
  • 14:36 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=prometheus2005.codfw.wmnet
  • 14:29 moritzm: installing NSS security updates
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: codfw maint (T327991)
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: codfw maint (T327991)
  • 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: codfw maint (T327991)
  • 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: codfw maint (T327991)
  • 14:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:02 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:00 vgutierrez: depooling codfw - T327991
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:55 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:55 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:55 gehel: depooling wdqs[2005,2007,2010].codfw.wmnet for switch maintenance - T327991
  • 13:54 gehel: depooling wcqs2001.codfw.wmnet for switch maintenance - T327991
  • 13:54 vgutierrez: depool doh2002 - T327991
  • 13:54 gehel: depooling elastic[2041-2044,2057-2058,2063-2064,2070,2077-2080].codfw.wmnet for switch maintenance - T327991
  • 13:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:51 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:48 godog: stop kafka on kafka-logging[2002,2004].codfw.wmnet - T327991
  • 13:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 215 hosts with reason: codfw row B upgrade
  • 13:38 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 215 hosts with reason: codfw row B upgrade
  • 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
  • 13:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
  • 13:34 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2022.codfw.wmnet with OS bullseye
  • 13:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 13:33 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2008.codfw.wmnet with OS bullseye
  • 13:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
  • 13:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 13:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2013.codfw.wmnet with OS bullseye
  • 13:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2024.codfw.wmnet with OS bullseye
  • 13:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2007.codfw.wmnet with OS bullseye
  • 13:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
  • 13:24 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1002.eqiad.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 13:19 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 13:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 13:17 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2023.codfw.wmnet with OS bullseye
  • 13:16 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2015.codfw.wmnet with OS bullseye
  • 13:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 13:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 13:15 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2006.codfw.wmnet with OS bullseye
  • 13:14 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 13:13 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 13:13 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:12 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 13:12 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2016.codfw.wmnet with OS bullseye
  • 13:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 13:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 13:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 13:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 13:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 13:08 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2005.codfw.wmnet with OS bullseye
  • 13:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 13:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 13:06 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 13:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 13:04 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 13:02 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 13:01 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2015.codfw.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 12:58 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 12:58 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2024.codfw.wmnet with OS bullseye
  • 12:57 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2022.codfw.wmnet with OS bullseye
  • 12:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2016.codfw.wmnet with reason: host reimage
  • 12:56 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2006.codfw.wmnet with reason: host reimage
  • 12:55 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
  • 12:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
  • 12:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2005.codfw.wmnet with reason: host reimage
  • 12:53 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2006.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2015.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2016.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2005.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
  • 12:50 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1002.eqiad.wmnet with OS bullseye
  • 12:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
  • 12:49 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2023.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
  • 12:41 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2015.codfw.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2016.codfw.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2006.codfw.wmnet with OS bullseye
  • 12:39 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2005.codfw.wmnet with OS bullseye
  • 12:36 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 12:35 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 12:34 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: T329664
  • 12:34 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster2002.codfw.wmnet with OS bullseye
  • 12:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster2002.codfw.wmnet with reason: host reimage
  • 12:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster2002.codfw.wmnet with reason: host reimage
  • 12:12 akosiaris: add 10.194.128.0/18 to kubernetes-ipv4 prefix-list for codfw. T326617
  • 12:06 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster2002.codfw.wmnet with OS bullseye
  • 12:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster2001.codfw.wmnet with OS bullseye
  • 11:55 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1001.eqiad.wmnet with reason: host reimage
  • 11:52 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1001.eqiad.wmnet with reason: host reimage
  • 11:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster2001.codfw.wmnet with reason: host reimage
  • 11:46 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster2001.codfw.wmnet with reason: host reimage
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44702 and previous config saved to /var/cache/conftool/dbconfig/20230221-114338-root.json
  • 11:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:34 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster2001.codfw.wmnet with OS bullseye
  • 11:32 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd2004.codfw.wmnet with OS bullseye
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44701 and previous config saved to /var/cache/conftool/dbconfig/20230221-112833-root.json
  • 11:27 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd2005.codfw.wmnet with OS bullseye
  • 11:26 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubetcd2006.codfw.wmnet with OS bullseye
  • 11:26 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:25 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:24 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: T329664
  • 11:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:19 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2004.codfw.wmnet with reason: host reimage
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bullseye
  • 11:16 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2004.codfw.wmnet with reason: host reimage
  • 11:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: host reimage
  • 11:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2006.codfw.wmnet with reason: host reimage
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44700 and previous config saved to /var/cache/conftool/dbconfig/20230221-111328-root.json
  • 11:12 vgutierrez: rolling upgrade to HAproxy 2.6.9 on ulsfo
  • 11:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: host reimage
  • 11:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2006.codfw.wmnet with reason: host reimage
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage
  • 11:00 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2006.codfw.wmnet with OS bullseye
  • 10:59 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2005.codfw.wmnet with OS bullseye
  • 10:59 root@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2004.codfw.wmnet with OS bullseye
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44699 and previous config saved to /var/cache/conftool/dbconfig/20230221-105823-root.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2161 T330134', diff saved to https://phabricator.wikimedia.org/P44698 and previous config saved to /var/cache/conftool/dbconfig/20230221-105714-ladsgroup.json
  • 10:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage
  • 10:55 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T330134', diff saved to https://phabricator.wikimedia.org/P44697 and previous config saved to /var/cache/conftool/dbconfig/20230221-105503-ladsgroup.json
  • 10:54 Amir1: Starting s8 codfw failover from db2161 to db2165 - T330134
  • 10:50 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 10:49 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23
  • 10:46 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bullseye
  • 10:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:33 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T330134', diff saved to https://phabricator.wikimedia.org/P44696 and previous config saved to /var/cache/conftool/dbconfig/20230221-103053-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134
  • 10:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:59 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:53 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool 2 services in codfw: T329664
  • 09:48 jayme@cumin1001: START - Cookbook sre.discovery.service-route depool 2 services in codfw: T329664
  • 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:31 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in codfw: maintenance
  • 09:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=prometheus2005.codfw.wmnet
  • 09:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
  • 09:14 vgutierrez: testing HAProxy 2.6.9 in cp4052 and cp4044
  • 09:13 jayme@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: maintenance
  • 09:12 hashar@deploy1002: Pruned MediaWiki: 1.40.0-wmf.22 (duration: 02m 16s)
  • 09:12 vgutierrez: update thirdparty/haproxy26 to version 2.6.9 for bullseye and buster (apt.wm.o)
  • 09:10 hashar@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.24 refs T325587 (duration: 45m 58s)
  • 08:49 moritzm: installing clamav security updates
  • 08:24 hashar@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.24 refs T325587
  • 08:21 kartik@deploy1002: Finished scap: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865) (duration: 16m 36s)
  • 08:09 kartik@deploy1002: kartik: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:04 kartik@deploy1002: Started scap: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865)
  • 07:49 XioNoX: Staging the new Junos version on the codfw row B switches - T327991
  • 01:06 urbanecm@deploy1002: Finished scap: Backport for cswikibooks: Enable visualeditor for all users (T330015) (duration: 08m 47s)
  • 00:59 urbanecm@deploy1002: urbanecm: Backport for cswikibooks: Enable visualeditor for all users (T330015) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 00:57 urbanecm@deploy1002: Started scap: Backport for cswikibooks: Enable visualeditor for all users (T330015)

2023-02-20

  • 21:35 zabe: close UTC late backport window
  • 21:33 zabe@deploy1002: Finished scap: T329983 T330104 (duration: 11m 51s)
  • 21:24 zabe@deploy1002: zabe: T329983 T330104 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:21 zabe@deploy1002: Started scap: T329983 T330104
  • 21:16 zabe: zabe@mwmaint1002:~$ mwscript namespaceDupes.php tawiki --fix # T329248
  • 21:14 zabe@deploy1002: Finished scap: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248) (duration: 08m 52s)
  • 21:07 zabe@deploy1002: superpes and zabe: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:06 zabe@deploy1002: Started scap: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248)
  • 20:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bullseye
  • 20:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 20:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 20:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bullseye
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44693 and previous config saved to /var/cache/conftool/dbconfig/20230220-185659-root.json
  • 18:50 taavi: taavi@mwmaint1002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki itwiki --current --all | tee T315510-itwiki.log # T315510
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44692 and previous config saved to /var/cache/conftool/dbconfig/20230220-184154-root.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44691 and previous config saved to /var/cache/conftool/dbconfig/20230220-182649-root.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44690 and previous config saved to /var/cache/conftool/dbconfig/20230220-181144-root.json
  • 17:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:16 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: sync
  • 16:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: sync
  • 16:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: sync
  • 16:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: sync
  • 16:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:40 volans: upgraded spicerack to v6.2.1 to the cumin hosts
  • 16:38 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Christina Macholan out of all services on: 943 hosts
  • 16:38 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Christina Macholan out of all services on: 943 hosts
  • 16:36 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Christina Macholan out of all services on: 1069 hosts
  • 16:36 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Christina Macholan out of all services on: 1069 hosts
  • 16:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:28 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 16:25 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.2.1
  • 16:25 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.2.1
  • 16:20 volans: uploaded spicerack_6.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 16:18 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:09 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
  • 15:57 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
  • 15:54 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 15:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:18 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 15:13 TheresNoTime: closing UTC afternoon backport window
  • 15:13 samtar@deploy1002: Finished scap: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224) (duration: 22m 03s)
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:04 samtar@deploy1002: samtar: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 15:03 TheresNoTime: UTC afternoon backport window overrunning
  • 14:58 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 14:51 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 14:51 samtar@deploy1002: Started scap: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224)
  • 14:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313) (duration: 08m 12s)
  • 14:31 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and ollieshotton: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:30 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313)
  • 14:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2003.codfw.wmnet with OS bullseye
  • 14:20 samtar@deploy1002: Finished scap: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866) (duration: 07m 44s)
  • 14:14 samtar@deploy1002: lucaswerkmeister-wmde and samtar: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:12 samtar@deploy1002: Started scap: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866)
  • 14:10 samtar@deploy1002: Finished scap: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026) (duration: 09m 00s)
  • 14:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2003.codfw.wmnet with reason: host reimage
  • 14:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2003.codfw.wmnet with reason: host reimage
  • 14:03 samtar@deploy1002: samtar and stang: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:01 samtar@deploy1002: Started scap: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026)
  • 13:55 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:51 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2003.codfw.wmnet with OS bullseye
  • 13:12 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:06 jbond: switch netbox to active/passive (had issues with active/active config)
  • 12:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bullseye
  • 12:49 jbond: switch netbox to active/active
  • 12:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2002.codfw.wmnet with reason: host reimage
  • 12:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2002.codfw.wmnet with reason: host reimage
  • 12:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 32 hosts with reason: In setup
  • 12:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 32 hosts with reason: In setup
  • 12:15 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bullseye
  • 12:12 moritzm: installing Java 8 security updates on Bullseye
  • 12:08 moritzm: upload openjdk-8 8u362-ga-4~deb11u1 to component/jdk8 for wikimedia-bullseye (forward port of latest Java 8 security fixes)
  • 11:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
  • 11:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
  • 11:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38565
  • 11:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:21 zabe: deployed updated mitigations for T326691
  • 10:21 zabe@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 59s)
  • 10:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38565
  • 10:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 2711
  • 10:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2711
  • 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 10:11 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 10:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2165 T330056', diff saved to https://phabricator.wikimedia.org/P44688 and previous config saved to /var/cache/conftool/dbconfig/20230220-095526-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T330056', diff saved to https://phabricator.wikimedia.org/P44687 and previous config saved to /var/cache/conftool/dbconfig/20230220-095308-ladsgroup.json
  • 09:52 Amir1: Starting s8 codfw failover from db2165 to db2161 - T330056
  • 09:48 akosiaris: Point out risk of MW train failing on Feb 21st in https://wikitech.wikimedia.org/wiki/Deployments#Tuesday,_February_21 due to WikiKube codfw upgrade
  • 09:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:33 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T330056', diff saved to https://phabricator.wikimedia.org/P44686 and previous config saved to /var/cache/conftool/dbconfig/20230220-092727-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330056
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330056
  • 09:09 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move all of NS-related config out of IS.php to a dedicated file, part III (T308932) (duration: 06m 24s)
  • 09:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:06 XioNoX: delete old group ID custom field from Netbox - https://netbox.wikimedia.org/extras/custom-fields/6/ - T260363
  • 09:01 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move all of NS-related config out of IS.php to a dedicated file, part II (T308932) (duration: 06m 47s)
  • 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:54 ladsgroup@deploy1002: Synchronized wmf-config/core-Namespaces.php: Move all of NS-related config out of IS.php to a dedicated file, part I (T308932) (duration: 16m 10s)
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 08:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mepps out of all services on: 946 hosts
  • 08:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Mepps out of all services on: 946 hosts
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mepps out of all services on: 1067 hosts
  • 08:42 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Mepps out of all services on: 1067 hosts
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:08 moritzm: updating openjdk-11 on elastic* servers T329957
  • 07:44 moritzm: imported jenkins 2.375.3 to thirdparty/ci T330045
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:27 Amir1: running migrateTagTemplate.php on all wikis (T329766)
  • 06:40 hashar: Restarting Gerrit

2023-02-18

  • 08:29 elukey: kill leftover processes of user `mepps` (offboarded) from stat100[4,5] to unblock puppet
  • 08:24 elukey: delete /var/log/{syslog,messages,user.log).1 on kubestagetcd1005 to free space
  • 08:22 elukey: delete /var/log/{messages,user.log).1 on kubestageetcd1006 to free space
  • 08:21 elukey: delete /var/log/syslog.1 on kubestageetcd1006 to free space

2023-02-17

  • 22:45 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 18:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10*.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2187.codfw.wmnet with OS bullseye
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 17:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2185.codfw.wmnet with OS bullseye
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:10 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2187.codfw.wmnet with OS bullseye
  • 17:06 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:06 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:06 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2185.codfw.wmnet with reason: host reimage
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2185.codfw.wmnet with reason: host reimage
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2185.codfw.wmnet with OS bullseye
  • 16:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2185.codfw.wmnet with OS bullseye
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2185.codfw.wmnet with OS bullseye
  • 16:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 16:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 16:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 16:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-staging-codfw: T327767
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:42 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10*.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:40 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2185']
  • 15:35 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader2004.wikimedia.org
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2185']
  • 15:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2185.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader2004.wikimedia.org on all recursors
  • 15:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader2004.wikimedia.org on all recursors
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2004.wikimedia.org - jmm@cumin2002"
  • 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2004.wikimedia.org - jmm@cumin2002"
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:46 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader2004.wikimedia.org
  • 14:46 elukey@cumin1001: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: T327767
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2185.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader2003.wikimedia.org
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for db218[567] - pt1979@cumin2002"
  • 14:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for db218[567] - pt1979@cumin2002"
  • 14:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader2003.wikimedia.org on all recursors
  • 14:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader2003.wikimedia.org on all recursors
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2003.wikimedia.org - jmm@cumin2002"
  • 14:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2003.wikimedia.org - jmm@cumin2002"
  • 14:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader2003.wikimedia.org
  • 13:40 godog: docker system prune on alert2001 - root fs almost full
  • 13:35 pfischer@deploy1002: Finished deploy [wikimedia/discovery/analytics@3a94765]: T327381: rdf-spark-tools update (duration: 02m 39s)
  • 13:33 pfischer@deploy1002: Started deploy [wikimedia/discovery/analytics@3a94765]: T327381: rdf-spark-tools update
  • 13:31 godog: docker system prune on alert1001 - root fs almost full
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:42 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster aux-eqiad: T329826
  • 12:38 jayme@cumin1001: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster aux-eqiad: T329826
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: JVM upgrades - elukey@cumin1001
  • 10:58 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: JVM upgrades - elukey@cumin1001
  • 10:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host idm2001.wikimedia.org with OS bullseye
  • 10:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 10:11 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 10:00 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host idm2001.wikimedia.org with OS bullseye
  • 08:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm2001.wikimedia.org
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm2001.wikimedia.org on all recursors
  • 08:35 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm2001.wikimedia.org on all recursors
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm2001.wikimedia.org - slyngshede@cumin1001"
  • 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm2001.wikimedia.org - slyngshede@cumin1001"
  • 08:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:30 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm2001.wikimedia.org
  • 02:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[2012,2015-2018,2020,2022,2023,2025-2027].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 00:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2012,2015-2018,2020,2022,2023,2025-2027].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 00:02 jhuneidi@deploy1002: Installation of scap version "4.37.0" completed for 564 hosts
  • 00:02 jhuneidi@deploy1002: Installing scap version "4.37.0" for 564 hosts

2023-02-16

  • 23:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[2013-2014,2019,2021,2024].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:15 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 22:15 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 22:15 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 22:13 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 22:13 rzl@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 22:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 22:11 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 22:10 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 22:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 22:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 22:09 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 22:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 22:08 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 22:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2013-2014,2019,2021,2024].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:07 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 22:07 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 22:06 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 21:24 TheresNoTime: close UTC late backport window
  • 21:21 eileen: civicrm upgraded from efa4c485 to ffc16d2d
  • 21:11 samtar@deploy1002: Finished scap: Backport for Remove Research Incentive survey from swwiki (T321252) (duration: 08m 13s)
  • 21:09 SandraEbele: Added new field referer_data to wmf.webrequest table using the alter table statement
  • 21:04 samtar@deploy1002: samtar and dani: Backport for Remove Research Incentive survey from swwiki (T321252) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:03 samtar@deploy1002: Started scap: Backport for Remove Research Incentive survey from swwiki (T321252)
  • 19:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.23 refs T325586
  • 19:03 dancy@deploy1002: Installation of scap version "4.36.0" completed for 564 hosts
  • 19:03 dancy@deploy1002: Installing scap version "4.36.0" for 564 hosts
  • 18:56 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f1a930] (duration: 01m 23s)
  • 18:54 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f1a930]
  • 18:54 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930] (thin): Regular analytics weekly train THIN [analytics/refinery@0f1a930] (duration: 00m 07s)
  • 18:54 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930] (thin): Regular analytics weekly train THIN [analytics/refinery@0f1a930]
  • 18:52 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930]: Regular analytics weekly train [analytics/refinery@0f1a930] (duration: 07m 11s)
  • 18:45 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930]: Regular analytics weekly train [analytics/refinery@0f1a930]
  • 18:37 SandraEbele: killed webrequest oozie bundle to deploy refinery changes.
  • 18:28 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:25 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:24 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:22 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:21 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:47 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2106', diff saved to https://phabricator.wikimedia.org/P44678 and previous config saved to /var/cache/conftool/dbconfig/20230216-174704-jynus.json
  • 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: JVM upgrades - elukey@cumin1001
  • 17:25 papaul: PDU maintenance in rack A8
  • 17:25 papaul: PDU maintenance in rack A1 complete
  • 17:21 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: JVM upgrades - elukey@cumin1001
  • 17:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:07 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:55 SandraEbele: Deployed refinery-source change to remove Github.io from Mediasites definition of referees.
  • 16:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part III (T308932) (duration: 06m 54s)
  • 16:19 moritzm: installing net-snmp security updates on Buster
  • 16:11 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part II (T308932) (duration: 06m 48s)
  • 16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:04 ladsgroup@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part I (T308932) (duration: 07m 05s)
  • 15:40 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 papaul: PDU maintenance in rack A1
  • 15:39 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:35 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 taavi@deploy1002: Finished scap: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550) (duration: 09m 23s)
  • 14:17 taavi@deploy1002: taavi and sgimeno: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:16 taavi@deploy1002: Started scap: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550)
  • 14:13 taavi: taavi@mwmaint1002:~$ mwscript updateCollation.php --wiki=simplewiki --previous-collation=uppercase | tee T329815.log # T329815
  • 14:13 taavi@deploy1002: Finished scap: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815) (duration: 10m 38s)
  • 14:04 taavi@deploy1002: superpes and taavi: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:02 taavi@deploy1002: Started scap: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815)
  • 12:06 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:06 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:04 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:04 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:39 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:39 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 claime: repool parse1012 for monitoring of possible CPU1 issues
  • 10:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bullseye
  • 10:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 10:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 10:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 10:31 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 10:22 moritzm: installing postgresql-11 security updates on maps*
  • 10:20 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host idm1001.wikimedia.org with OS bullseye
  • 10:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm1001.wikimedia.org
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm1001.wikimedia.org on all recursors
  • 10:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm1001.wikimedia.org on all recursors
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"
  • 10:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm1001.wikimedia.org
  • 09:58 godog: issue test page with: amtool alert add TestPage address=6.6.6.6 team=sre severity=page job=testjob --annotation=runbook=lol --annotation=description='this is a test page, please ignore' --annotation=dashboard=no
  • 09:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 09:35 godog: puppet cert clean labstore100[67] - T319217
  • 09:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 09:07 moritzm: uploaded openjdk-8 8u362-ga-4~deb10u1 to component/jdk8 for buster-wikimedia (forward port of latest Java 8 security release)
  • 08:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
  • 08:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
  • 08:25 moritzm: upgrading cassandra-dev to Java 8u362-ga-4
  • 08:17 apergos: UTC morning backport and config training window done
  • 08:15 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865) (duration: 12m 38s)
  • 08:05 kartik@deploy1002: kartik: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865)
  • 07:41 elukey: depool parse1012 to allow the service ops team to check it
  • 07:39 elukey: powercycle parse1012 - CPU1 errors registered in `racadm getsel`
  • 07:25 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-airflow1005.eqiad.wmnet with OS buster
  • 06:16 kart_: Updated cxserver to 2023-02-15-085109-production (T328310, T110190, T116466)
  • 06:11 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:11 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-02-15

  • 23:30 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.23 refs T325586 (duration: 06m 43s)
  • 23:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.23 refs T325586
  • 23:15 ladsgroup@deploy1002: Finished scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) (duration: 08m 12s)
  • 23:08 ladsgroup@deploy1002: ladsgroup: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 23:06 ladsgroup@deploy1002: Started scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342)
  • 23:04 dduvall@deploy1002: Finished scap: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740) (duration: 08m 47s)
  • 23:01 Amir1: running linter migrate namespace on all wikis (T329764)
  • 22:57 dduvall@deploy1002: cscott and dduvall: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:55 dduvall@deploy1002: Started scap: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740)
  • 22:22 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 22:17 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 21:15 urandom: rebooting sessionstore2001 w/o cookbook — T327954
  • 21:10 dancy@deploy1002: Installation of scap version "4.35.0" completed for 563 hosts
  • 21:09 dancy@deploy1002: Installing scap version "4.35.0" for 563 hosts
  • 20:39 urandom: rebooting sessionstore2001 w/o cookbook — T327954
  • 20:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 20:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 20:04 urandom: setting Cassandra query trace probability to 0 (disabled) on sessionstore cluster, codfw datacenter — T327954
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 19:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 19:48 urandom: setting Cassandra query trace probability to 0.25 on sessionstore cluster, codfw datacenter — T327954
  • 19:42 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: Depooling while we attempt to reproduce errors — T327954
  • 19:36 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: Depooling while we attempt to reproduce errors — T327954
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 19:36 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 19:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.23"
  • 19:23 dduvall: rolling back due to spike in parsoid errors (T325586)
  • 19:23 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.23 refs T325586 (duration: 06m 36s)
  • 19:18 dduvall: correction: spike may be temporary. holding (T325586)
  • 19:17 dduvall: large spike in undefined property errors. rolling back (T325586)
  • 19:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.23 refs T325586
  • 18:41 akosiaris: dummy entry
  • 18:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS buster
  • 18:12 moritzm: installing curl security updates on bullseye (not buster)
  • 18:12 moritzm: installing curl security updates on buster
  • 17:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 17:24 sukhe: [done] disable puppet on A:dns-auth: merging CR 889560
  • 17:18 sukhe: disable puppet on A:dns-auth: merging CR 889560
  • 16:52 moritzm: installing postgresql-11 security updates
  • 16:45 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 16:43 moritzm: restarting Exim on MXes to pick up gnutls security updates
  • 16:42 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 16:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 16:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 16:16 moritzm: installing gnutls28 security updates
  • 16:12 bking@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1005.eqiad.wmnet with OS buster
  • 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow1005.eqiad.wmnet
  • 16:12 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-airflow1005.eqiad.wmnet
  • 16:03 thcipriani: restart ci jenkins for updates
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44673 and previous config saved to /var/cache/conftool/dbconfig/20230215-160100-ladsgroup.json
  • 15:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35467
  • 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35467
  • 15:49 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:48 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44672 and previous config saved to /var/cache/conftool/dbconfig/20230215-154555-ladsgroup.json
  • 15:42 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:41 eevans@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool sessionstore in codfw: maintenance
  • 15:41 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 15:39 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 15:39 eevans@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool sessionstore in codfw: maintenance
  • 15:39 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:33 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 15:33 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 15:33 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:33 bking@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44671 and previous config saved to /var/cache/conftool/dbconfig/20230215-153050-ladsgroup.json
  • 15:30 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:29 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:27 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:27 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:26 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:24 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44670 and previous config saved to /var/cache/conftool/dbconfig/20230215-151545-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:55 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:55 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:33 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 TheresNoTime: closing UTC afternoon backport window
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 samtar@deploy1002: Finished scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) (duration: 08m 25s)
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:29 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:27 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:24 samtar@deploy1002: samtar: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:22 samtar@deploy1002: Started scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224)
  • 14:22 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:22 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:19 TheresNoTime: `samtar@mwmaint1002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwikinews --current --all | tee persistRevisionThreadItems.out.txt` in screen session `25805.T315510` for T315510
  • 14:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:19 samtar@deploy1002: Finished scap: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627) (duration: 07m 34s)
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bullseye
  • 14:14 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:13 samtar@deploy1002: matmarex and samtar: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:13 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:11 samtar@deploy1002: Started scap: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627)
  • 14:11 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940) (duration: 09m 13s)
  • 14:04 samtar@deploy1002: samtar and matmarex: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:02 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940)
  • 14:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 13:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 13:40 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bullseye
  • 13:27 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php newiki pageassessments` T328224
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2127 T329730', diff saved to https://phabricator.wikimedia.org/P44668 and previous config saved to /var/cache/conftool/dbconfig/20230215-130822-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 primary T329730', diff saved to https://phabricator.wikimedia.org/P44667 and previous config saved to /var/cache/conftool/dbconfig/20230215-130653-ladsgroup.json
  • 13:06 Amir1: Starting s3 codfw failover from db2127 to db2105 - T329730
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2105 with weight 0 T329730', diff saved to https://phabricator.wikimedia.org/P44666 and previous config saved to /var/cache/conftool/dbconfig/20230215-124729-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T329730
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T329730
  • 12:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4004.wikimedia.org with OS buster
  • 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS bullseye
  • 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 11:23 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-airflow1005.eqiad.wmnet with OS bullseye
  • 11:22 Emperor: thanos-be2001 rm /srv/swift-storage/sda3/tmp/b0e33b98-f8be-409b-a9d2-246ad5812db0
  • 11:21 Emperor: thanos-be2001 rm /srv/swift-storage/sda3/tmp/c10e5844-1b19-4c8d-b474-801ad3dd6849
  • 11:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bullseye
  • 11:09 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-ctrl2002.codfw.wmnet with OS bullseye
  • 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
  • 10:52 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
  • 10:45 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
  • 10:42 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:40 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-ctrl2002.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-ctrl2001.codfw.wmnet with OS bullseye
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:39 Emperor: discard /var/spool/rsyslog on thanos-be2001 T329712
  • 10:33 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:25 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:13 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:13 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:13 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 10:10 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-ctrl2001.codfw.wmnet with OS bullseye
  • 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet with OS bullseye
  • 09:54 moritzm: installing openjdk-11 security updates
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
  • 09:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
  • 09:41 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2003.codfw.wmnet with OS bullseye
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:34 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:32 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 09:23 cdanis@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: upgrade to v1.23
  • 09:23 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 09:09 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:06 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:06 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 08:55 Emperor: truncate -s 2GB /srv/log/swift/server.log.1 on thanos-be2001 to free space in /
  • 08:55 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:54 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 08:53 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 08:52 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:39 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 08:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 08:25 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:06 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 07:59 vgutierrez: rolling upgrade to HAProxy 2.6.8-2 in cp nodes
  • 07:56 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:54 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 04:27 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
  • 04:27 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 03:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: Puppet failure during reimaging
  • 03:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: Puppet failure during reimaging
  • 02:03 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1001.eqiad.wmnet with OS bullseye
  • 01:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 01:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 01:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 01:22 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T299954 (duration: 06m 50s)
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44663 and previous config saved to /var/cache/conftool/dbconfig/20230215-011110-ladsgroup.json
  • 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44662 and previous config saved to /var/cache/conftool/dbconfig/20230215-005604-ladsgroup.json
  • 00:49 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 00:49 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 00:47 krinkle@deploy1002: Synchronized wmf-config/: I3144a56e17ecb (duration: 06m 33s)
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44661 and previous config saved to /var/cache/conftool/dbconfig/20230215-004058-ladsgroup.json
  • 00:35 krinkle@deploy1002: Synchronized multiversion/: I3144a56e17ecb (duration: 06m 51s)
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44660 and previous config saved to /var/cache/conftool/dbconfig/20230215-002552-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44659 and previous config saved to /var/cache/conftool/dbconfig/20230215-002245-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44658 and previous config saved to /var/cache/conftool/dbconfig/20230215-002224-ladsgroup.json
  • 00:15 krinkle@deploy1002: Synchronized src/Profiler.php: Ife7bde (duration: 07m 05s)
  • 00:08 krinkle@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: staging config patch --krinkle (duration: 04m 22s)
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44657 and previous config saved to /var/cache/conftool/dbconfig/20230215-000717-ladsgroup.json
  • 00:03 krinkle@deploy1002: Locking from deployment [ALL REPOSITORIES]: staging config patch --krinkle (planned duration: 60m 00s)
  • 00:02 cwhite@deploy1002: Finished deploy [releng/phatality@b1a2a70]: T314098 (duration: 00m 14s)
  • 00:01 cwhite@deploy1002: Started deploy [releng/phatality@b1a2a70]: T314098

2023-02-14

  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44656 and previous config saved to /var/cache/conftool/dbconfig/20230214-235211-ladsgroup.json
  • 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44655 and previous config saved to /var/cache/conftool/dbconfig/20230214-233705-ladsgroup.json
  • 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44654 and previous config saved to /var/cache/conftool/dbconfig/20230214-233058-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44653 and previous config saved to /var/cache/conftool/dbconfig/20230214-233037-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44652 and previous config saved to /var/cache/conftool/dbconfig/20230214-231531-ladsgroup.json
  • 23:14 cwhite@deploy1002: Finished deploy [releng/phatality@eaa4c16]: T314098 (duration: 00m 07s)
  • 23:14 cwhite@deploy1002: Started deploy [releng/phatality@eaa4c16]: T314098
  • 23:14 cwhite@deploy1002: Finished deploy [releng/phatality@eaa4c16]: T314098 (duration: 10m 40s)
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2442.codfw.wmnet with OS buster
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2443.codfw.wmnet with OS buster
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 cwhite@deploy1002: Started deploy [releng/phatality@eaa4c16]: T314098
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44651 and previous config saved to /var/cache/conftool/dbconfig/20230214-230025-ladsgroup.json
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2443.codfw.wmnet with reason: host reimage
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2442.codfw.wmnet with reason: host reimage
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2443.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2442.codfw.wmnet with reason: host reimage
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44650 and previous config saved to /var/cache/conftool/dbconfig/20230214-224519-ladsgroup.json
  • 22:39 eoghan@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bullseye
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44649 and previous config saved to /var/cache/conftool/dbconfig/20230214-223931-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44648 and previous config saved to /var/cache/conftool/dbconfig/20230214-223910-ladsgroup.json
  • 22:28 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 22:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 22:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2443.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2442.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2441.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:25 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2440.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44647 and previous config saved to /var/cache/conftool/dbconfig/20230214-222403-ladsgroup.json
  • 22:10 eoghan@cumin2002: START - Cookbook sre.ganeti.reimage for host aphlict2001.codfw.wmnet with OS bullseye
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44646 and previous config saved to /var/cache/conftool/dbconfig/20230214-220857-ladsgroup.json
  • 22:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44645 and previous config saved to /var/cache/conftool/dbconfig/20230214-215351-ladsgroup.json
  • 21:47 dancy@deploy1002: Finished scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) (duration: 07m 44s)
  • 21:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2441.codfw.wmnet with reason: host reimage
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44644 and previous config saved to /var/cache/conftool/dbconfig/20230214-214642-ladsgroup.json
  • 21:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44643 and previous config saved to /var/cache/conftool/dbconfig/20230214-214621-ladsgroup.json
  • 21:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2441.codfw.wmnet with reason: host reimage
  • 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2440.codfw.wmnet with reason: host reimage
  • 21:41 dancy@deploy1002: dancy and sbailey: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:39 dancy@deploy1002: Started scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342)
  • 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2440.codfw.wmnet with reason: host reimage
  • 21:35 dancy@deploy1002: Backport cancelled.
  • 21:34 dancy@deploy1002: Finished scap: Backport for Enable Page Tools for logged in users across all wikis (T328692) (duration: 12m 50s)
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44642 and previous config saved to /var/cache/conftool/dbconfig/20230214-213115-ladsgroup.json
  • 21:23 dancy@deploy1002: dancy and bwang: Backport for Enable Page Tools for logged in users across all wikis (T328692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:21 dancy@deploy1002: Started scap: Backport for Enable Page Tools for logged in users across all wikis (T328692)
  • 21:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2441.codfw.wmnet with OS buster
  • 21:20 dancy@deploy1002: Backport cancelled.
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44640 and previous config saved to /var/cache/conftool/dbconfig/20230214-211608-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44639 and previous config saved to /var/cache/conftool/dbconfig/20230214-210102-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44638 and previous config saved to /var/cache/conftool/dbconfig/20230214-205709-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44637 and previous config saved to /var/cache/conftool/dbconfig/20230214-205633-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44636 and previous config saved to /var/cache/conftool/dbconfig/20230214-204126-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44635 and previous config saved to /var/cache/conftool/dbconfig/20230214-202620-ladsgroup.json
  • 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: failure during reimaging
  • 20:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: failure during reimaging
  • 20:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4004.wikimedia.org with OS buster
  • 20:12 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.23 refs T325586
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44634 and previous config saved to /var/cache/conftool/dbconfig/20230214-201114-ladsgroup.json
  • 20:09 bking@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1005.eqiad.wmnet with OS bullseye
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44633 and previous config saved to /var/cache/conftool/dbconfig/20230214-200822-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44632 and previous config saved to /var/cache/conftool/dbconfig/20230214-200801-ladsgroup.json
  • 19:55 AndyRussG: update SmashPig 683df497 -> c6775c60
  • 19:53 dduvall@deploy1002: Pruned MediaWiki: 1.40.0-wmf.21 (duration: 02m 10s)
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44631 and previous config saved to /var/cache/conftool/dbconfig/20230214-195255-ladsgroup.json
  • 19:50 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.23 refs T325586 (duration: 09m 14s)
  • 19:46 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2069.codfw.wmnet,service=elasticsearch-psi-ssl
  • 19:46 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:46 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.codfw.wmnet,service=elasticsearch-psi-ssl
  • 19:45 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.cofdw.wmnet,service=elasticsearch-psi-ssl
  • 19:43 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.cofdw.wmnet
  • 19:43 gehel@puppetmaster1001: conftool action : set/pooled=active; selector: name=elastic2069.cofdw.wmnet
  • 19:41 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.23 refs T325586
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44630 and previous config saved to /var/cache/conftool/dbconfig/20230214-193748-ladsgroup.json
  • 19:37 dduvall: did not run `docker system prune` due to objections
  • 19:36 mutante: root@deploy1002:/srv# rm -rf deployment.T307349/
  • 19:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:34 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 19:33 dduvall: running `docker system prune` on deploy1002 to free up disk space on /srv
  • 19:31 dduvall: scap sync-world failed due to lack of disk space on deploy1002 /srv (cc T325586)
  • 19:31 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 19:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44629 and previous config saved to /var/cache/conftool/dbconfig/20230214-192242-ladsgroup.json
  • 19:22 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:21 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:17 papaul: upgrading firmware on mc-gp1003
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44628 and previous config saved to /var/cache/conftool/dbconfig/20230214-191550-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:09 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:08 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.23 refs T325586 (duration: 51m 03s)
  • 19:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:06 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 18:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 18:55 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 18:54 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:47 cdanis@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: upgrade to v1.23
  • 18:46 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 18:38 sukhe: reimage dns4004 back to buster to resolve pdns-rec Prometheus endpoit issues: T321309
  • 18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mc-gp1002']
  • 18:37 cdanis@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: upgrade to v1.23
  • 18:36 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns4004.wikimedia.org with OS bullseye
  • 18:35 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1002']
  • 18:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp1002']
  • 18:17 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.23 refs T325586
  • 18:15 papaul: upgrading firmware on mc-gp1002
  • 18:10 dduvall: refactored failed security patch for T278365
  • 18:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1002']
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 17:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bullseye
  • 17:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:24 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 17:18 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 17:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 17:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 17:05 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - rc1.mediawiki.page_change: enable on all wikis (duration: 07m 11s)
  • 16:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 16:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 16:38 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 16:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 16:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 16:36 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 16:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 16:35 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:34 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:34 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:34 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:33 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:33 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:27 andrew@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 16:22 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - Produce rc1.mediawiki.page_change to eventgate-main (duration: 09m 01s)
  • 16:16 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 16:12 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2451.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2448.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2450.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2449.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:52 moritzm: uploaded src:icu67 67.1-7~wmf1 to buster-wikimedia/component/icu67 T329491
  • 15:50 inflatador: bking@deploy1002 'deploying rdf-streaming-updater prod eqiad T304914'
  • 15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:48 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:45 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:43 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:34 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2451.codfw.wmnet with reason: host reimage
  • 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2450.codfw.wmnet with reason: host reimage
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2451.codfw.wmnet with reason: host reimage
  • 15:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 15:24 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2450.codfw.wmnet with reason: host reimage
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 15:21 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 15:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:10 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2451.codfw.wmnet with OS buster
  • 15:05 moritzm: installing openjdk-11 security updates
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2450.codfw.wmnet with OS buster
  • 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2449.codfw.wmnet with OS buster
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2448.codfw.wmnet with OS buster
  • 14:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 14:54 godog: roll-restart pybal in eqiad/codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/889083
  • 14:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:41 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:41 andrew@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 14:41 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:40 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 14:30 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:28 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:26 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:25 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:24 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:22 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:22 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:21 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:17 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:11 moritzm: installing libde265 security updates
  • 13:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1001.eqiad.wmnet with reason: host reimage
  • 13:20 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1001.eqiad.wmnet with reason: host reimage
  • 13:08 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44624 and previous config saved to /var/cache/conftool/dbconfig/20230214-130708-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44623 and previous config saved to /var/cache/conftool/dbconfig/20230214-125203-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44622 and previous config saved to /var/cache/conftool/dbconfig/20230214-123659-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44621 and previous config saved to /var/cache/conftool/dbconfig/20230214-122154-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44620 and previous config saved to /var/cache/conftool/dbconfig/20230214-120649-root.json
  • 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44619 and previous config saved to /var/cache/conftool/dbconfig/20230214-115144-root.json
  • 11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
  • 11:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
  • 11:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
  • 11:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
  • 10:58 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:19 moritzm: installing imagemagick security updates on bullseye
  • 10:09 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:08 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:03 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:01 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44618 and previous config saved to /var/cache/conftool/dbconfig/20230214-095544-root.json
  • 09:50 godog: roll-restart pybal in eqiad/codfw to pick up logs-api service - T320702
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44616 and previous config saved to /var/cache/conftool/dbconfig/20230214-094040-root.json
  • 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44615 and previous config saved to /var/cache/conftool/dbconfig/20230214-092535-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44614 and previous config saved to /var/cache/conftool/dbconfig/20230214-091030-root.json
  • 09:04 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api
  • 08:57 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,service=logs-api
  • 08:55 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api,dc=codfw
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44613 and previous config saved to /var/cache/conftool/dbconfig/20230214-085525-root.json
  • 08:54 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm6001.drmrs.wmnet
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44612 and previous config saved to /var/cache/conftool/dbconfig/20230214-084020-root.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T328817)', diff saved to https://phabricator.wikimedia.org/P44611 and previous config saved to /var/cache/conftool/dbconfig/20230214-083022-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T328817)', diff saved to https://phabricator.wikimedia.org/P44610 and previous config saved to /var/cache/conftool/dbconfig/20230214-082915-marostegui.json
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44609 and previous config saved to /var/cache/conftool/dbconfig/20230214-082854-marostegui.json
  • 08:26 vgutierrez: rolling upgrade to HAProxy 2.6.8 in eqsin - T321775
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm6001.drmrs.wmnet
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P44608 and previous config saved to /var/cache/conftool/dbconfig/20230214-081348-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P44607 and previous config saved to /var/cache/conftool/dbconfig/20230214-075842-marostegui.json
  • 07:58 XioNoX: enable CF in esams
  • 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 07:58 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44606 and previous config saved to /var/cache/conftool/dbconfig/20230214-074335-marostegui.json
  • 07:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44605 and previous config saved to /var/cache/conftool/dbconfig/20230214-072452-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44604 and previous config saved to /var/cache/conftool/dbconfig/20230214-072157-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44603 and previous config saved to /var/cache/conftool/dbconfig/20230214-072136-marostegui.json
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1099.eqiad.wmnet
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1099.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:10 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1099.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44602 and previous config saved to /var/cache/conftool/dbconfig/20230214-070946-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P44601 and previous config saved to /var/cache/conftool/dbconfig/20230214-070630-marostegui.json
  • 07:01 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1099.eqiad.wmnet
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44600 and previous config saved to /var/cache/conftool/dbconfig/20230214-065440-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P44599 and previous config saved to /var/cache/conftool/dbconfig/20230214-065123-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44598 and previous config saved to /var/cache/conftool/dbconfig/20230214-063933-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44597 and previous config saved to /var/cache/conftool/dbconfig/20230214-063617-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44596 and previous config saved to /var/cache/conftool/dbconfig/20230214-062118-marostegui.json
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44595 and previous config saved to /var/cache/conftool/dbconfig/20230214-062057-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44594 and previous config saved to /var/cache/conftool/dbconfig/20230214-061434-marostegui.json
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44593 and previous config saved to /var/cache/conftool/dbconfig/20230214-061413-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44592 and previous config saved to /var/cache/conftool/dbconfig/20230214-060551-marostegui.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P44591 and previous config saved to /var/cache/conftool/dbconfig/20230214-055906-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44590 and previous config saved to /var/cache/conftool/dbconfig/20230214-055044-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P44589 and previous config saved to /var/cache/conftool/dbconfig/20230214-054400-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44588 and previous config saved to /var/cache/conftool/dbconfig/20230214-053538-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44587 and previous config saved to /var/cache/conftool/dbconfig/20230214-053325-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44586 and previous config saved to /var/cache/conftool/dbconfig/20230214-053304-marostegui.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44585 and previous config saved to /var/cache/conftool/dbconfig/20230214-052854-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44584 and previous config saved to /var/cache/conftool/dbconfig/20230214-051758-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44583 and previous config saved to /var/cache/conftool/dbconfig/20230214-050644-marostegui.json
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44582 and previous config saved to /var/cache/conftool/dbconfig/20230214-050623-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44581 and previous config saved to /var/cache/conftool/dbconfig/20230214-050252-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P44580 and previous config saved to /var/cache/conftool/dbconfig/20230214-045117-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44579 and previous config saved to /var/cache/conftool/dbconfig/20230214-044745-marostegui.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44578 and previous config saved to /var/cache/conftool/dbconfig/20230214-044432-marostegui.json
  • 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44577 and previous config saved to /var/cache/conftool/dbconfig/20230214-044411-marostegui.json
  • 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P44576 and previous config saved to /var/cache/conftool/dbconfig/20230214-043610-marostegui.json
  • 04:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44575 and previous config saved to /var/cache/conftool/dbconfig/20230214-042905-marostegui.json
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44574 and previous config saved to /var/cache/conftool/dbconfig/20230214-042104-marostegui.json
  • 04:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44573 and previous config saved to /var/cache/conftool/dbconfig/20230214-041359-marostegui.json
  • 03:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44572 and previous config saved to /var/cache/conftool/dbconfig/20230214-035922-marostegui.json
  • 03:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 03:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 03:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44571 and previous config saved to /var/cache/conftool/dbconfig/20230214-035852-marostegui.json
  • 03:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44570 and previous config saved to /var/cache/conftool/dbconfig/20230214-035639-marostegui.json
  • 03:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44569 and previous config saved to /var/cache/conftool/dbconfig/20230214-035618-marostegui.json
  • 03:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44568 and previous config saved to /var/cache/conftool/dbconfig/20230214-034112-marostegui.json
  • 03:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2447.codfw.wmnet with OS buster
  • 03:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44567 and previous config saved to /var/cache/conftool/dbconfig/20230214-032606-marostegui.json
  • 03:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44566 and previous config saved to /var/cache/conftool/dbconfig/20230214-031059-marostegui.json
  • 03:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2445.codfw.wmnet with OS buster
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2446.codfw.wmnet with OS buster
  • 03:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 03:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44565 and previous config saved to /var/cache/conftool/dbconfig/20230214-030345-marostegui.json
  • 03:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44564 and previous config saved to /var/cache/conftool/dbconfig/20230214-025917-marostegui.json
  • 02:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2444.codfw.wmnet with OS buster
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:53 tgr:: Deployed security patch for T328643
  • 02:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44563 and previous config saved to /var/cache/conftool/dbconfig/20230214-024410-marostegui.json
  • 02:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2447.codfw.wmnet with OS buster
  • 02:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2446.codfw.wmnet with reason: host reimage
  • 02:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
  • 02:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2446.codfw.wmnet with reason: host reimage
  • 02:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
  • 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 02:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 02:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44562 and previous config saved to /var/cache/conftool/dbconfig/20230214-022904-marostegui.json
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2446.codfw.wmnet with OS buster
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2445.codfw.wmnet with OS buster
  • 02:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44561 and previous config saved to /var/cache/conftool/dbconfig/20230214-021358-marostegui.json
  • 02:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2444.codfw.wmnet with OS buster
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44560 and previous config saved to /var/cache/conftool/dbconfig/20230214-020852-marostegui.json
  • 02:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44559 and previous config saved to /var/cache/conftool/dbconfig/20230214-020831-marostegui.json
  • 02:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44558 and previous config saved to /var/cache/conftool/dbconfig/20230214-020748-marostegui.json
  • 02:01 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2440.codfw.wmnet with OS buster
  • 01:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44557 and previous config saved to /var/cache/conftool/dbconfig/20230214-015325-marostegui.json
  • 01:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P44556 and previous config saved to /var/cache/conftool/dbconfig/20230214-015242-marostegui.json
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 01:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2443.codfw.wmnet with OS buster
  • 01:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2442.codfw.wmnet with OS buster
  • 01:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44555 and previous config saved to /var/cache/conftool/dbconfig/20230214-013818-marostegui.json
  • 01:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P44554 and previous config saved to /var/cache/conftool/dbconfig/20230214-013736-marostegui.json
  • 01:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2441.codfw.wmnet with OS buster
  • 01:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2440.codfw.wmnet with OS buster
  • 01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44553 and previous config saved to /var/cache/conftool/dbconfig/20230214-012312-marostegui.json
  • 01:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44552 and previous config saved to /var/cache/conftool/dbconfig/20230214-012230-marostegui.json
  • 01:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44551 and previous config saved to /var/cache/conftool/dbconfig/20230214-011758-marostegui.json
  • 01:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44550 and previous config saved to /var/cache/conftool/dbconfig/20230214-011720-marostegui.json
  • 01:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44549 and previous config saved to /var/cache/conftool/dbconfig/20230214-010214-marostegui.json
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44548 and previous config saved to /var/cache/conftool/dbconfig/20230214-004707-marostegui.json
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2443.codfw.wmnet with OS buster
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2442.codfw.wmnet with OS buster
  • 00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2441.codfw.wmnet with OS buster
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44547 and previous config saved to /var/cache/conftool/dbconfig/20230214-003201-marostegui.json
  • 00:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44546 and previous config saved to /var/cache/conftool/dbconfig/20230214-002620-marostegui.json
  • 00:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44545 and previous config saved to /var/cache/conftool/dbconfig/20230214-002559-marostegui.json
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2438.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2439.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44544 and previous config saved to /var/cache/conftool/dbconfig/20230214-002214-marostegui.json
  • 00:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44543 and previous config saved to /var/cache/conftool/dbconfig/20230214-002136-marostegui.json
  • 00:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44542 and previous config saved to /var/cache/conftool/dbconfig/20230214-001053-marostegui.json
  • 00:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P44541 and previous config saved to /var/cache/conftool/dbconfig/20230214-000629-marostegui.json
  • 00:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp2003']
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44540 and previous config saved to /var/cache/conftool/dbconfig/20230214-000419-ladsgroup.json
  • 00:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2439.codfw.wmnet with reason: host reimage

2023-02-13

  • 23:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2438.codfw.wmnet with reason: host reimage
  • 23:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2003']
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2439.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2003']
  • 23:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44539 and previous config saved to /var/cache/conftool/dbconfig/20230213-235546-marostegui.json
  • 23:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2438.codfw.wmnet with reason: host reimage
  • 23:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P44538 and previous config saved to /var/cache/conftool/dbconfig/20230213-235123-marostegui.json
  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44537 and previous config saved to /var/cache/conftool/dbconfig/20230213-234912-ladsgroup.json
  • 23:48 papaul: upgrading firmware on mc-gp2003
  • 23:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2003']
  • 23:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44536 and previous config saved to /var/cache/conftool/dbconfig/20230213-234040-marostegui.json
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2439.codfw.wmnet with OS buster
  • 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44535 and previous config saved to /var/cache/conftool/dbconfig/20230213-233617-marostegui.json
  • 23:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2438.codfw.wmnet with OS buster
  • 23:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44534 and previous config saved to /var/cache/conftool/dbconfig/20230213-233407-marostegui.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44533 and previous config saved to /var/cache/conftool/dbconfig/20230213-233406-ladsgroup.json
  • 23:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 23:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44532 and previous config saved to /var/cache/conftool/dbconfig/20230213-233356-marostegui.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44531 and previous config saved to /var/cache/conftool/dbconfig/20230213-231900-ladsgroup.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44530 and previous config saved to /var/cache/conftool/dbconfig/20230213-231850-marostegui.json
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2436.codfw.wmnet with OS buster
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2437.codfw.wmnet with OS buster
  • 23:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44529 and previous config saved to /var/cache/conftool/dbconfig/20230213-231402-marostegui.json
  • 23:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 23:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44528 and previous config saved to /var/cache/conftool/dbconfig/20230213-231137-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44527 and previous config saved to /var/cache/conftool/dbconfig/20230213-231116-ladsgroup.json
  • 23:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mc-gp2002']
  • 23:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44526 and previous config saved to /var/cache/conftool/dbconfig/20230213-230343-marostegui.json
  • 22:59 zabe@deploy1002: Finished scap: Backport for stop setting checkuser actor/comment migration variables (T233004) (duration: 07m 46s)
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44525 and previous config saved to /var/cache/conftool/dbconfig/20230213-225610-ladsgroup.json
  • 22:53 zabe@deploy1002: zabe: Backport for stop setting checkuser actor/comment migration variables (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 22:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 22:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44524 and previous config saved to /var/cache/conftool/dbconfig/20230213-225312-marostegui.json
  • 22:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp2002']
  • 22:51 zabe@deploy1002: Started scap: Backport for stop setting checkuser actor/comment migration variables (T233004)
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44523 and previous config saved to /var/cache/conftool/dbconfig/20230213-224837-marostegui.json
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2002']
  • 22:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44522 and previous config saved to /var/cache/conftool/dbconfig/20230213-224240-marostegui.json
  • 22:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44521 and previous config saved to /var/cache/conftool/dbconfig/20230213-224219-marostegui.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44520 and previous config saved to /var/cache/conftool/dbconfig/20230213-224102-ladsgroup.json
  • 22:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44519 and previous config saved to /var/cache/conftool/dbconfig/20230213-223806-marostegui.json
  • 22:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage
  • 22:36 papaul: upgrading firmware on mc-gp2002
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage
  • 22:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 22:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44518 and previous config saved to /var/cache/conftool/dbconfig/20230213-222713-marostegui.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44517 and previous config saved to /var/cache/conftool/dbconfig/20230213-222556-ladsgroup.json
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2437.codfw.wmnet with OS buster
  • 22:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44516 and previous config saved to /var/cache/conftool/dbconfig/20230213-222300-marostegui.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44515 and previous config saved to /var/cache/conftool/dbconfig/20230213-221840-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44514 and previous config saved to /var/cache/conftool/dbconfig/20230213-221815-ladsgroup.json
  • 22:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2436.codfw.wmnet with OS buster
  • 22:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44513 and previous config saved to /var/cache/conftool/dbconfig/20230213-221207-marostegui.json
  • 22:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44512 and previous config saved to /var/cache/conftool/dbconfig/20230213-220753-marostegui.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44511 and previous config saved to /var/cache/conftool/dbconfig/20230213-220308-ladsgroup.json
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 21:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44510 and previous config saved to /var/cache/conftool/dbconfig/20230213-215701-marostegui.json
  • 21:56 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 21:53 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:51 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet']
  • 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44509 and previous config saved to /var/cache/conftool/dbconfig/20230213-215055-marostegui.json
  • 21:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44508 and previous config saved to /var/cache/conftool/dbconfig/20230213-215034-marostegui.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44507 and previous config saved to /var/cache/conftool/dbconfig/20230213-214802-ladsgroup.json
  • 21:44 taavi@deploy1002: Finished scap: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis"" (duration: 09m 00s)
  • 21:42 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet']
  • 21:40 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@5edcd7b]: deploying section_topics v0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
  • 21:39 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@5edcd7b]: deploying section_topics v0.5.0 on platform_eng Airflow instance
  • 21:37 taavi@deploy1002: taavi: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis"" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:35 taavi@deploy1002: Started scap: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis""
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44506 and previous config saved to /var/cache/conftool/dbconfig/20230213-213526-marostegui.json
  • 21:34 taavi@deploy1002: Finished scap: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523) (duration: 08m 38s)
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44505 and previous config saved to /var/cache/conftool/dbconfig/20230213-213256-ladsgroup.json
  • 21:27 taavi@deploy1002: taavi and matmarex: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:26 taavi@deploy1002: Started scap: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523)
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44504 and previous config saved to /var/cache/conftool/dbconfig/20230213-212529-ladsgroup.json
  • 21:25 taavi@deploy1002: Finished scap: lmowiktionary: Create extendedmover group (T327340) (duration: 08m 28s)
  • 21:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44503 and previous config saved to /var/cache/conftool/dbconfig/20230213-212020-marostegui.json
  • 21:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44502 and previous config saved to /var/cache/conftool/dbconfig/20230213-211932-ladsgroup.json
  • 21:18 taavi@deploy1002: taavi: lmowiktionary: Create extendedmover group (T327340) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:16 taavi@deploy1002: Started scap: lmowiktionary: Create extendedmover group (T327340)
  • 21:15 taavi@deploy1002: Backport cancelled.
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44501 and previous config saved to /var/cache/conftool/dbconfig/20230213-210738-marostegui.json
  • 21:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44500 and previous config saved to /var/cache/conftool/dbconfig/20230213-210717-marostegui.json
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44499 and previous config saved to /var/cache/conftool/dbconfig/20230213-210513-marostegui.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44498 and previous config saved to /var/cache/conftool/dbconfig/20230213-210426-ladsgroup.json
  • 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44497 and previous config saved to /var/cache/conftool/dbconfig/20230213-205905-marostegui.json
  • 20:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44496 and previous config saved to /var/cache/conftool/dbconfig/20230213-205855-marostegui.json
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44495 and previous config saved to /var/cache/conftool/dbconfig/20230213-205211-marostegui.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44494 and previous config saved to /var/cache/conftool/dbconfig/20230213-204920-ladsgroup.json
  • 20:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44493 and previous config saved to /var/cache/conftool/dbconfig/20230213-204348-marostegui.json
  • 20:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44492 and previous config saved to /var/cache/conftool/dbconfig/20230213-203704-marostegui.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44491 and previous config saved to /var/cache/conftool/dbconfig/20230213-203413-ladsgroup.json
  • 20:32 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 20:30 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44490 and previous config saved to /var/cache/conftool/dbconfig/20230213-202842-marostegui.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44489 and previous config saved to /var/cache/conftool/dbconfig/20230213-202656-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44488 and previous config saved to /var/cache/conftool/dbconfig/20230213-202635-ladsgroup.json
  • 20:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44487 and previous config saved to /var/cache/conftool/dbconfig/20230213-202157-marostegui.json
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44486 and previous config saved to /var/cache/conftool/dbconfig/20230213-201336-marostegui.json
  • 20:13 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:12 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 20:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bullseye
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44485 and previous config saved to /var/cache/conftool/dbconfig/20230213-201129-ladsgroup.json
  • 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44484 and previous config saved to /var/cache/conftool/dbconfig/20230213-200742-marostegui.json
  • 20:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329203)', diff saved to https://phabricator.wikimedia.org/P44483 and previous config saved to /var/cache/conftool/dbconfig/20230213-200654-marostegui.json
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44482 and previous config saved to /var/cache/conftool/dbconfig/20230213-195743-marostegui.json
  • 19:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T328817)', diff saved to https://phabricator.wikimedia.org/P44481 and previous config saved to /var/cache/conftool/dbconfig/20230213-195722-marostegui.json
  • 19:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44480 and previous config saved to /var/cache/conftool/dbconfig/20230213-195623-ladsgroup.json
  • 19:56 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 16:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44436 and previous config saved to /var/cache/conftool/dbconfig/20230213-162456-marostegui.json
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44435 and previous config saved to /var/cache/conftool/dbconfig/20230213-161824-root.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44434 and previous config saved to /var/cache/conftool/dbconfig/20230213-161605-marostegui.json
  • 16:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44433 and previous config saved to /var/cache/conftool/dbconfig/20230213-161543-marostegui.json
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44432 and previous config saved to /var/cache/conftool/dbconfig/20230213-160950-marostegui.json
  • 16:07 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44431 and previous config saved to /var/cache/conftool/dbconfig/20230213-160320-root.json
  • 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 16:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1001.eqiad.wmnet
  • 16:02 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1001"
  • 16:01 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P44429 and previous config saved to /var/cache/conftool/dbconfig/20230213-160037-marostegui.json
  • 15:59 nfraison@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 15:58 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1001"
  • 15:56 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329203)', diff saved to https://phabricator.wikimedia.org/P44428 and previous config saved to /var/cache/conftool/dbconfig/20230213-155444-marostegui.json
  • 15:51 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 15:51 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1001.eqiad.wmnet
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T329203)', diff saved to https://phabricator.wikimedia.org/P44427 and previous config saved to /var/cache/conftool/dbconfig/20230213-154850-marostegui.json
  • 15:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44426 and previous config saved to /var/cache/conftool/dbconfig/20230213-154815-root.json
  • 15:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P44425 and previous config saved to /var/cache/conftool/dbconfig/20230213-154531-marostegui.json
  • 15:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:42 dcausse: restarting blazegraph on wdqs1004 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:38 sukhe: disable puppet on A:dns-rec; merging CR 888236
  • 15:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:34 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44424 and previous config saved to /var/cache/conftool/dbconfig/20230213-153309-root.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44423 and previous config saved to /var/cache/conftool/dbconfig/20230213-153025-marostegui.json
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:22 dcausse: T327878: rebuilding CirrusSearch completion index on mnwiki from mwmaint1002
  • 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 same version (noop)
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44422 and previous config saved to /var/cache/conftool/dbconfig/20230213-151805-marostegui.json
  • 15:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php guwwiktionary --fix | tee T309054-namespaceDupes.out # T309054 [0 pages to fix, 0 were resolvable; 0 links to fix, 0 were resolvable; 0 were deleted]
  • 15:13 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Rename project namespace in guwwiktionary (T309054) (duration: 08m 33s)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44421 and previous config saved to /var/cache/conftool/dbconfig/20230213-150644-marostegui.json
  • 15:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44420 and previous config saved to /var/cache/conftool/dbconfig/20230213-150623-marostegui.json
  • 15:06 lucaswerkmeister-wmde@deploy1002: jhsoby and lucaswerkmeister-wmde: Backport for Rename project namespace in guwwiktionary (T309054) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Rename project namespace in guwwiktionary (T309054)
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T329203)', diff saved to https://phabricator.wikimedia.org/P44419 and previous config saved to /var/cache/conftool/dbconfig/20230213-150259-marostegui.json
  • 15:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439) (duration: 08m 25s)
  • 14:59 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-presto1001.eqiad.wmnet with reason: host reimage
  • 14:56 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-presto1001.eqiad.wmnet with reason: host reimage
  • 14:55 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439)
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P44418 and previous config saved to /var/cache/conftool/dbconfig/20230213-145117-marostegui.json
  • 14:47 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878) (duration: 11m 18s)
  • 14:46 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 same version (noop)
  • 14:42 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:42 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:42 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:42 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:41 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:38 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:38 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:37 lucaswerkmeister-wmde@deploy1002: dcausse and lucaswerkmeister-wmde: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P44417 and previous config saved to /var/cache/conftool/dbconfig/20230213-143611-marostegui.json
  • 14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878)
  • 14:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:35 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:34 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 14:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 14:28 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add logs-api VIP - filippo@cumin1001"
  • 14:27 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add logs-api VIP - filippo@cumin1001"
  • 14:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 14:26 lucaswerkmeister-wmde@deploy1002: backport aborted: (duration: 10m 13s)
  • 14:24 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:21 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:21 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:21 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 14:21 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44416 and previous config saved to /var/cache/conftool/dbconfig/20230213-142105-marostegui.json
  • 14:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:20 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
  • 14:17 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 14:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add iOS stream config (duration: 10m 06s)
  • 14:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 14:13 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:13 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cloudcephosd1002.eqiad.wmnet with reason: moving racks
  • 14:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cloudcephosd1002.eqiad.wmnet with reason: moving racks
  • 14:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cloudcephosd1001.eqiad.wmnet with reason: moving racks
  • 14:12 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cloudcephosd1001.eqiad.wmnet with reason: moving racks
  • 14:11 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:10 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:07 lucaswerkmeister-wmde@deploy1002: mazevedo and lucaswerkmeister-wmde: Backport for Add iOS stream config synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:07 jbond: upload node-bgpalerter_1.31.2 to apt
  • 14:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:05 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add iOS stream config
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T329203)', diff saved to https://phabricator.wikimedia.org/P44415 and previous config saved to /var/cache/conftool/dbconfig/20230213-140243-marostegui.json
  • 14:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:02 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:02 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44414 and previous config saved to /var/cache/conftool/dbconfig/20230213-140222-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44413 and previous config saved to /var/cache/conftool/dbconfig/20230213-135753-marostegui.json
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44412 and previous config saved to /var/cache/conftool/dbconfig/20230213-135732-marostegui.json
  • 13:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6677
  • 13:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6677
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44411 and previous config saved to /var/cache/conftool/dbconfig/20230213-134716-marostegui.json
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P44410 and previous config saved to /var/cache/conftool/dbconfig/20230213-134226-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44409 and previous config saved to /var/cache/conftool/dbconfig/20230213-133210-marostegui.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P44408 and previous config saved to /var/cache/conftool/dbconfig/20230213-132719-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44407 and previous config saved to /var/cache/conftool/dbconfig/20230213-131703-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44406 and previous config saved to /var/cache/conftool/dbconfig/20230213-131348-marostegui.json
  • 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44405 and previous config saved to /var/cache/conftool/dbconfig/20230213-131327-marostegui.json
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44404 and previous config saved to /var/cache/conftool/dbconfig/20230213-131213-marostegui.json
  • 13:05 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 13:01 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44403 and previous config saved to /var/cache/conftool/dbconfig/20230213-125821-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44402 and previous config saved to /var/cache/conftool/dbconfig/20230213-124853-marostegui.json
  • 12:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44401 and previous config saved to /var/cache/conftool/dbconfig/20230213-124828-marostegui.json
  • 12:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44400 and previous config saved to /var/cache/conftool/dbconfig/20230213-124314-marostegui.json
  • 12:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:40 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P44399 and previous config saved to /var/cache/conftool/dbconfig/20230213-123322-marostegui.json
  • 12:31 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:28 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44398 and previous config saved to /var/cache/conftool/dbconfig/20230213-122808-marostegui.json
  • 12:25 claime: thumbor roll-restarts done
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44397 and previous config saved to /var/cache/conftool/dbconfig/20230213-122401-marostegui.json
  • 12:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44396 and previous config saved to /var/cache/conftool/dbconfig/20230213-122339-marostegui.json
  • 12:22 claime: Roll-restart thumbor in eqiad - Deploying CR 888657
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P44395 and previous config saved to /var/cache/conftool/dbconfig/20230213-121816-marostegui.json
  • 12:15 marostegui: Upgrade db1205 and db2184 to mariadb 10.6.12 T329499
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44394 and previous config saved to /var/cache/conftool/dbconfig/20230213-120833-marostegui.json
  • 12:07 claime: Roll-restart thumbor in codfw - Deploying CR 888657
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44393 and previous config saved to /var/cache/conftool/dbconfig/20230213-120309-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44392 and previous config saved to /var/cache/conftool/dbconfig/20230213-115327-marostegui.json
  • 11:49 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:45 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install1003.wikimedia.org
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44391 and previous config saved to /var/cache/conftool/dbconfig/20230213-114002-marostegui.json
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44390 and previous config saved to /var/cache/conftool/dbconfig/20230213-113941-marostegui.json
  • 11:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44389 and previous config saved to /var/cache/conftool/dbconfig/20230213-113821-marostegui.json
  • 11:37 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:37 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44388 and previous config saved to /var/cache/conftool/dbconfig/20230213-113430-marostegui.json
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44387 and previous config saved to /var/cache/conftool/dbconfig/20230213-113408-marostegui.json
  • 11:31 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install1003.wikimedia.org
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install2003.wikimedia.org
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P44386 and previous config saved to /var/cache/conftool/dbconfig/20230213-112435-marostegui.json
  • 11:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:19 nfraison@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44385 and previous config saved to /var/cache/conftool/dbconfig/20230213-111902-marostegui.json
  • 11:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install2003.wikimedia.org
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P44384 and previous config saved to /var/cache/conftool/dbconfig/20230213-110928-marostegui.json
  • 11:08 jbond: rolling out no_proxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/879418
  • 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Cluster half broken, in the middle of upgrading
  • 11:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Cluster half broken, in the middle of upgrading
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44383 and previous config saved to /var/cache/conftool/dbconfig/20230213-110356-marostegui.json
  • 11:00 marostegui: Failover m1 from db1176 to db1164 - T329259
  • 10:56 godog: roll-restart pybal in eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/888648 - T320702
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44382 and previous config saved to /var/cache/conftool/dbconfig/20230213-105422-marostegui.json
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44381 and previous config saved to /var/cache/conftool/dbconfig/20230213-104850-marostegui.json
  • 10:48 nfraison@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44380 and previous config saved to /var/cache/conftool/dbconfig/20230213-104337-marostegui.json
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44379 and previous config saved to /var/cache/conftool/dbconfig/20230213-104316-marostegui.json
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1001.eqiad.wmnet with OS bullseye
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44378 and previous config saved to /var/cache/conftool/dbconfig/20230213-103126-marostegui.json
  • 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44377 and previous config saved to /var/cache/conftool/dbconfig/20230213-103105-marostegui.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44376 and previous config saved to /var/cache/conftool/dbconfig/20230213-102810-marostegui.json
  • 10:16 jynus: stopping bacula and disabling puppet at backup1001 for m1 switchover T329259
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P44375 and previous config saved to /var/cache/conftool/dbconfig/20230213-101559-marostegui.json
  • 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44374 and previous config saved to /var/cache/conftool/dbconfig/20230213-101304-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 10:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46375
  • 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46375
  • 10:05 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host test-reimage2001.codfw.wmnet with OS bullseye
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P44373 and previous config saved to /var/cache/conftool/dbconfig/20230213-100053-marostegui.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44372 and previous config saved to /var/cache/conftool/dbconfig/20230213-095757-marostegui.json
  • 09:54 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 09:54 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on test-reimage2001.codfw.wmnet with reason: host reimage
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44371 and previous config saved to /var/cache/conftool/dbconfig/20230213-095257-marostegui.json
  • 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329203)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230213-095231-marostegui.json
  • 09:51 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on test-reimage2001.codfw.wmnet with reason: host reimage
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44369 and previous config saved to /var/cache/conftool/dbconfig/20230213-094546-marostegui.json
  • 09:44 vgutierrez: rolling upgrade to HAProxy 2.6.8 in ulsfo - T321775
  • 09:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 09:43 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host test-reimage2001.codfw.wmnet with OS bullseye
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44368 and previous config saved to /var/cache/conftool/dbconfig/20230213-093725-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44367 and previous config saved to /var/cache/conftool/dbconfig/20230213-092228-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44366 and previous config saved to /var/cache/conftool/dbconfig/20230213-092218-marostegui.json
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44365 and previous config saved to /var/cache/conftool/dbconfig/20230213-092207-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44363 and previous config saved to /var/cache/conftool/dbconfig/20230213-090712-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P44362 and previous config saved to /var/cache/conftool/dbconfig/20230213-090701-marostegui.json
  • 09:03 moritzm: rolling restart of Apache on mw/codfw servers to pick up updated libxml
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44361 and previous config saved to /var/cache/conftool/dbconfig/20230213-090302-marostegui.json
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44360 and previous config saved to /var/cache/conftool/dbconfig/20230213-090241-marostegui.json
  • 08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P44358 and previous config saved to /var/cache/conftool/dbconfig/20230213-085154-marostegui.json
  • 08:51 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:51 Emperor: rolling-restart of codfw swift frontends
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44357 and previous config saved to /var/cache/conftool/dbconfig/20230213-084735-marostegui.json
  • 08:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44356 and previous config saved to /var/cache/conftool/dbconfig/20230213-083648-marostegui.json
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44355 and previous config saved to /var/cache/conftool/dbconfig/20230213-083229-marostegui.json
  • 08:29 taavi@deploy1002: Finished scap: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465) (duration: 19m 24s)
  • 08:27 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 08:20 taavi@deploy1002: taavi and superpes: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44354 and previous config saved to /var/cache/conftool/dbconfig/20230213-081722-marostegui.json
  • 08:17 moritzm: installing curl security updates
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44353 and previous config saved to /var/cache/conftool/dbconfig/20230213-081431-marostegui.json
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 08:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44352 and previous config saved to /var/cache/conftool/dbconfig/20230213-081344-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44351 and previous config saved to /var/cache/conftool/dbconfig/20230213-081332-marostegui.json
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44350 and previous config saved to /var/cache/conftool/dbconfig/20230213-081311-marostegui.json
  • 08:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:10 taavi@deploy1002: Started scap: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465)
  • 07:59 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:59 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44349 and previous config saved to /var/cache/conftool/dbconfig/20230213-075838-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P44348 and previous config saved to /var/cache/conftool/dbconfig/20230213-075805-marostegui.json
  • 07:55 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:54 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:54 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:53 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:47 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:46 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44347 and previous config saved to /var/cache/conftool/dbconfig/20230213-074331-marostegui.json
  • 07:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P44346 and previous config saved to /var/cache/conftool/dbconfig/20230213-074258-marostegui.json
  • 07:41 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:41 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:39 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:37 marostegui: Deploy schema change on db2151 T329260
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44345 and previous config saved to /var/cache/conftool/dbconfig/20230213-072825-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44344 and previous config saved to /var/cache/conftool/dbconfig/20230213-072752-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44343 and previous config saved to /var/cache/conftool/dbconfig/20230213-072535-marostegui.json
  • 07:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44342 and previous config saved to /var/cache/conftool/dbconfig/20230213-072514-marostegui.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44341 and previous config saved to /var/cache/conftool/dbconfig/20230213-071007-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44340 and previous config saved to /var/cache/conftool/dbconfig/20230213-070717-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44339 and previous config saved to /var/cache/conftool/dbconfig/20230213-065501-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1099 from dbctl T329181', diff saved to https://phabricator.wikimedia.org/P44338 and previous config saved to /var/cache/conftool/dbconfig/20230213-064051-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44337 and previous config saved to /var/cache/conftool/dbconfig/20230213-063955-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44336 and previous config saved to /var/cache/conftool/dbconfig/20230213-063449-marostegui.json
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:05 AndyRussG: DjangoBannerStats upgraded from c9926cfc to 5dc35ea2 on fran1001

2023-02-11

  • 01:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44335 and previous config saved to /var/cache/conftool/dbconfig/20230211-015530-marostegui.json
  • 01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P44334 and previous config saved to /var/cache/conftool/dbconfig/20230211-014023-marostegui.json
  • 01:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P44333 and previous config saved to /var/cache/conftool/dbconfig/20230211-012517-marostegui.json
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44332 and previous config saved to /var/cache/conftool/dbconfig/20230211-011010-marostegui.json
  • 01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44331 and previous config saved to /var/cache/conftool/dbconfig/20230211-010454-marostegui.json
  • 01:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 01:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 01:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44330 and previous config saved to /var/cache/conftool/dbconfig/20230211-010433-marostegui.json
  • 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P44329 and previous config saved to /var/cache/conftool/dbconfig/20230211-004927-marostegui.json
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P44328 and previous config saved to /var/cache/conftool/dbconfig/20230211-003420-marostegui.json
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44327 and previous config saved to /var/cache/conftool/dbconfig/20230211-001914-marostegui.json
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44326 and previous config saved to /var/cache/conftool/dbconfig/20230211-001658-marostegui.json
  • 00:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44325 and previous config saved to /var/cache/conftool/dbconfig/20230211-001637-marostegui.json
  • 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P44324 and previous config saved to /var/cache/conftool/dbconfig/20230211-000131-marostegui.json

2023-02-10

  • 23:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P44323 and previous config saved to /var/cache/conftool/dbconfig/20230210-234624-marostegui.json
  • 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44322 and previous config saved to /var/cache/conftool/dbconfig/20230210-233118-marostegui.json
  • 23:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44321 and previous config saved to /var/cache/conftool/dbconfig/20230210-232526-marostegui.json
  • 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44320 and previous config saved to /var/cache/conftool/dbconfig/20230210-232505-marostegui.json
  • 23:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P44319 and previous config saved to /var/cache/conftool/dbconfig/20230210-230958-marostegui.json
  • 22:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P44318 and previous config saved to /var/cache/conftool/dbconfig/20230210-225452-marostegui.json
  • 22:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:44 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44317 and previous config saved to /var/cache/conftool/dbconfig/20230210-223946-marostegui.json
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44316 and previous config saved to /var/cache/conftool/dbconfig/20230210-223430-marostegui.json
  • 22:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44315 and previous config saved to /var/cache/conftool/dbconfig/20230210-223420-marostegui.json
  • 22:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.22 refs T325585
  • 22:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P44314 and previous config saved to /var/cache/conftool/dbconfig/20230210-221914-marostegui.json
  • 22:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44313 and previous config saved to /var/cache/conftool/dbconfig/20230210-220816-marostegui.json
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P44312 and previous config saved to /var/cache/conftool/dbconfig/20230210-220408-marostegui.json
  • 21:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44311 and previous config saved to /var/cache/conftool/dbconfig/20230210-215310-marostegui.json
  • 21:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44310 and previous config saved to /var/cache/conftool/dbconfig/20230210-214901-marostegui.json
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44309 and previous config saved to /var/cache/conftool/dbconfig/20230210-214308-marostegui.json
  • 21:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44308 and previous config saved to /var/cache/conftool/dbconfig/20230210-214241-marostegui.json
  • 21:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44307 and previous config saved to /var/cache/conftool/dbconfig/20230210-213803-marostegui.json
  • 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P44306 and previous config saved to /var/cache/conftool/dbconfig/20230210-212734-marostegui.json
  • 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44305 and previous config saved to /var/cache/conftool/dbconfig/20230210-212257-marostegui.json
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44304 and previous config saved to /var/cache/conftool/dbconfig/20230210-212046-marostegui.json
  • 21:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44303 and previous config saved to /var/cache/conftool/dbconfig/20230210-212025-marostegui.json
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P44302 and previous config saved to /var/cache/conftool/dbconfig/20230210-211228-marostegui.json
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44301 and previous config saved to /var/cache/conftool/dbconfig/20230210-210519-marostegui.json
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44300 and previous config saved to /var/cache/conftool/dbconfig/20230210-205722-marostegui.json
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44299 and previous config saved to /var/cache/conftool/dbconfig/20230210-205059-marostegui.json
  • 20:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 20:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44298 and previous config saved to /var/cache/conftool/dbconfig/20230210-205012-marostegui.json
  • 20:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44297 and previous config saved to /var/cache/conftool/dbconfig/20230210-204638-marostegui.json
  • 20:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44296 and previous config saved to /var/cache/conftool/dbconfig/20230210-203506-marostegui.json
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44295 and previous config saved to /var/cache/conftool/dbconfig/20230210-203255-marostegui.json
  • 20:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44294 and previous config saved to /var/cache/conftool/dbconfig/20230210-203234-marostegui.json
  • 20:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P44293 and previous config saved to /var/cache/conftool/dbconfig/20230210-203131-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44292 and previous config saved to /var/cache/conftool/dbconfig/20230210-201728-marostegui.json
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P44291 and previous config saved to /var/cache/conftool/dbconfig/20230210-201625-marostegui.json
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44290 and previous config saved to /var/cache/conftool/dbconfig/20230210-200221-marostegui.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44289 and previous config saved to /var/cache/conftool/dbconfig/20230210-200118-marostegui.json
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44288 and previous config saved to /var/cache/conftool/dbconfig/20230210-195902-marostegui.json
  • 19:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44287 and previous config saved to /var/cache/conftool/dbconfig/20230210-195841-marostegui.json
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44286 and previous config saved to /var/cache/conftool/dbconfig/20230210-194715-marostegui.json
  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44285 and previous config saved to /var/cache/conftool/dbconfig/20230210-194504-marostegui.json
  • 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44284 and previous config saved to /var/cache/conftool/dbconfig/20230210-194443-marostegui.json
  • 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P44283 and previous config saved to /var/cache/conftool/dbconfig/20230210-194335-marostegui.json
  • 19:39 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.22 refs T325585 (duration: 06m 34s)
  • 19:33 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.22 refs T325585
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44282 and previous config saved to /var/cache/conftool/dbconfig/20230210-192935-marostegui.json
  • 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P44281 and previous config saved to /var/cache/conftool/dbconfig/20230210-192828-marostegui.json
  • 19:23 demon@deploy1002: Finished scap: Updating wikibase to fix T329233 (duration: 07m 49s)
  • 19:16 demon@deploy1002: Started scap: Updating wikibase to fix T329233
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44280 and previous config saved to /var/cache/conftool/dbconfig/20230210-191429-marostegui.json
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44279 and previous config saved to /var/cache/conftool/dbconfig/20230210-191322-marostegui.json
  • 19:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44278 and previous config saved to /var/cache/conftool/dbconfig/20230210-190711-marostegui.json
  • 19:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44277 and previous config saved to /var/cache/conftool/dbconfig/20230210-190650-marostegui.json
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44276 and previous config saved to /var/cache/conftool/dbconfig/20230210-185923-marostegui.json
  • 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44275 and previous config saved to /var/cache/conftool/dbconfig/20230210-185712-marostegui.json
  • 18:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44274 and previous config saved to /var/cache/conftool/dbconfig/20230210-185651-marostegui.json
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P44273 and previous config saved to /var/cache/conftool/dbconfig/20230210-185144-marostegui.json
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44272 and previous config saved to /var/cache/conftool/dbconfig/20230210-184144-marostegui.json
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P44271 and previous config saved to /var/cache/conftool/dbconfig/20230210-183638-marostegui.json
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44270 and previous config saved to /var/cache/conftool/dbconfig/20230210-182638-marostegui.json
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44269 and previous config saved to /var/cache/conftool/dbconfig/20230210-182131-marostegui.json
  • 18:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44268 and previous config saved to /var/cache/conftool/dbconfig/20230210-181456-marostegui.json
  • 18:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44267 and previous config saved to /var/cache/conftool/dbconfig/20230210-181132-marostegui.json
  • 18:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44266 and previous config saved to /var/cache/conftool/dbconfig/20230210-180953-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44265 and previous config saved to /var/cache/conftool/dbconfig/20230210-180921-marostegui.json
  • 18:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44264 and previous config saved to /var/cache/conftool/dbconfig/20230210-180502-marostegui.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P44263 and previous config saved to /var/cache/conftool/dbconfig/20230210-175447-marostegui.json
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44262 and previous config saved to /var/cache/conftool/dbconfig/20230210-174956-marostegui.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P44261 and previous config saved to /var/cache/conftool/dbconfig/20230210-173941-marostegui.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44260 and previous config saved to /var/cache/conftool/dbconfig/20230210-173450-marostegui.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44259 and previous config saved to /var/cache/conftool/dbconfig/20230210-172434-marostegui.json
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44258 and previous config saved to /var/cache/conftool/dbconfig/20230210-171943-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44257 and previous config saved to /var/cache/conftool/dbconfig/20230210-171818-marostegui.json
  • 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44256 and previous config saved to /var/cache/conftool/dbconfig/20230210-171757-marostegui.json
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44255 and previous config saved to /var/cache/conftool/dbconfig/20230210-171349-marostegui.json
  • 17:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44254 and previous config saved to /var/cache/conftool/dbconfig/20230210-171328-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P44253 and previous config saved to /var/cache/conftool/dbconfig/20230210-170250-marostegui.json
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44252 and previous config saved to /var/cache/conftool/dbconfig/20230210-165822-marostegui.json
  • 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:48 sukhe: reprepro -C main include bullseye-wikimedia gdnsd_3.8.0-1~wmf2_amd64.changes: T321309
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P44248 and previous config saved to /var/cache/conftool/dbconfig/20230210-164744-marostegui.json
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44247 and previous config saved to /var/cache/conftool/dbconfig/20230210-164316-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44246 and previous config saved to /var/cache/conftool/dbconfig/20230210-163238-marostegui.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44245 and previous config saved to /var/cache/conftool/dbconfig/20230210-162809-marostegui.json
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44244 and previous config saved to /var/cache/conftool/dbconfig/20230210-162615-marostegui.json
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44243 and previous config saved to /var/cache/conftool/dbconfig/20230210-162559-marostegui.json
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44242 and previous config saved to /var/cache/conftool/dbconfig/20230210-162553-marostegui.json
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44241 and previous config saved to /var/cache/conftool/dbconfig/20230210-162520-marostegui.json
  • 16:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P44240 and previous config saved to /var/cache/conftool/dbconfig/20230210-161047-marostegui.json
  • 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44239 and previous config saved to /var/cache/conftool/dbconfig/20230210-161014-marostegui.json
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ml-staging[2001-2002].codfw.wmnet,ml-staging-ctrl[2001-2002].codfw.wmnet,ml-staging-etcd2003.codfw.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 15:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ml-staging[2001-2002].codfw.wmnet,ml-staging-ctrl[2001-2002].codfw.wmnet,ml-staging-etcd2003.codfw.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P44238 and previous config saved to /var/cache/conftool/dbconfig/20230210-155541-marostegui.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44237 and previous config saved to /var/cache/conftool/dbconfig/20230210-155508-marostegui.json
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44236 and previous config saved to /var/cache/conftool/dbconfig/20230210-154034-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44235 and previous config saved to /var/cache/conftool/dbconfig/20230210-154001-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44234 and previous config saved to /var/cache/conftool/dbconfig/20230210-153411-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44233 and previous config saved to /var/cache/conftool/dbconfig/20230210-153349-marostegui.json
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44232 and previous config saved to /var/cache/conftool/dbconfig/20230210-153112-marostegui.json
  • 15:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44231 and previous config saved to /var/cache/conftool/dbconfig/20230210-153051-marostegui.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P44230 and previous config saved to /var/cache/conftool/dbconfig/20230210-151843-marostegui.json
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44229 and previous config saved to /var/cache/conftool/dbconfig/20230210-151544-marostegui.json
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2003.codfw.wmnet
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P44228 and previous config saved to /var/cache/conftool/dbconfig/20230210-150337-marostegui.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44227 and previous config saved to /var/cache/conftool/dbconfig/20230210-150038-marostegui.json
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 14:53 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 14:52 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44226 and previous config saved to /var/cache/conftool/dbconfig/20230210-144830-marostegui.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44225 and previous config saved to /var/cache/conftool/dbconfig/20230210-144530-marostegui.json
  • 14:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44224 and previous config saved to /var/cache/conftool/dbconfig/20230210-144204-marostegui.json
  • 14:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44223 and previous config saved to /var/cache/conftool/dbconfig/20230210-144143-marostegui.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44222 and previous config saved to /var/cache/conftool/dbconfig/20230210-143815-marostegui.json
  • 14:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44221 and previous config saved to /var/cache/conftool/dbconfig/20230210-143753-marostegui.json
  • 14:36 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:33 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P44220 and previous config saved to /var/cache/conftool/dbconfig/20230210-142636-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44219 and previous config saved to /var/cache/conftool/dbconfig/20230210-142247-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P44218 and previous config saved to /var/cache/conftool/dbconfig/20230210-141128-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44217 and previous config saved to /var/cache/conftool/dbconfig/20230210-140741-marostegui.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44216 and previous config saved to /var/cache/conftool/dbconfig/20230210-135622-marostegui.json
  • 13:56 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44215 and previous config saved to /var/cache/conftool/dbconfig/20230210-135345-marostegui.json
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44214 and previous config saved to /var/cache/conftool/dbconfig/20230210-135319-marostegui.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44213 and previous config saved to /var/cache/conftool/dbconfig/20230210-135235-marostegui.json
  • 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 13:48 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44212 and previous config saved to /var/cache/conftool/dbconfig/20230210-134544-marostegui.json
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44211 and previous config saved to /var/cache/conftool/dbconfig/20230210-134523-marostegui.json
  • 13:39 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44210 and previous config saved to /var/cache/conftool/dbconfig/20230210-133823-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P44209 and previous config saved to /var/cache/conftool/dbconfig/20230210-133813-marostegui.json
  • 13:36 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44208 and previous config saved to /var/cache/conftool/dbconfig/20230210-133016-marostegui.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44207 and previous config saved to /var/cache/conftool/dbconfig/20230210-132318-root.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P44206 and previous config saved to /var/cache/conftool/dbconfig/20230210-132307-marostegui.json
  • 13:21 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 13:19 volans: upgraded spicerack to 6.1.0 on the cumin hosts
  • 13:19 topranks: Adjusting evpn route export policy on lsw1-e2-eqiad to include host routes
  • 13:18 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2003.codfw.wmnet with OS bullseye
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44205 and previous config saved to /var/cache/conftool/dbconfig/20230210-131509-marostegui.json
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb1003.eqiad.wmnet
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44204 and previous config saved to /var/cache/conftool/dbconfig/20230210-130813-root.json
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44203 and previous config saved to /var/cache/conftool/dbconfig/20230210-130801-marostegui.json
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
  • 13:03 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44202 and previous config saved to /var/cache/conftool/dbconfig/20230210-130110-marostegui.json
  • 13:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44201 and previous config saved to /var/cache/conftool/dbconfig/20230210-130049-marostegui.json
  • 13:00 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44200 and previous config saved to /var/cache/conftool/dbconfig/20230210-130002-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44199 and previous config saved to /var/cache/conftool/dbconfig/20230210-125308-root.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44198 and previous config saved to /var/cache/conftool/dbconfig/20230210-125301-marostegui.json
  • 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44197 and previous config saved to /var/cache/conftool/dbconfig/20230210-125240-marostegui.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P44196 and previous config saved to /var/cache/conftool/dbconfig/20230210-124543-marostegui.json
  • 12:45 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2003.codfw.wmnet with OS bullseye
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44195 and previous config saved to /var/cache/conftool/dbconfig/20230210-123757-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44194 and previous config saved to /var/cache/conftool/dbconfig/20230210-123733-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P44193 and previous config saved to /var/cache/conftool/dbconfig/20230210-123036-marostegui.json
  • 12:26 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44192 and previous config saved to /var/cache/conftool/dbconfig/20230210-122252-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44191 and previous config saved to /var/cache/conftool/dbconfig/20230210-122227-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44190 and previous config saved to /var/cache/conftool/dbconfig/20230210-121530-marostegui.json
  • 12:13 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 12:13 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44189 and previous config saved to /var/cache/conftool/dbconfig/20230210-121252-marostegui.json
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:10 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44188 and previous config saved to /var/cache/conftool/dbconfig/20230210-120747-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44187 and previous config saved to /var/cache/conftool/dbconfig/20230210-120721-marostegui.json
  • 12:06 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:04 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 12:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:02 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329203)', diff saved to https://phabricator.wikimedia.org/P44186 and previous config saved to /var/cache/conftool/dbconfig/20230210-120123-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44185 and previous config saved to /var/cache/conftool/dbconfig/20230210-120014-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44184 and previous config saved to /var/cache/conftool/dbconfig/20230210-115953-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T329203)', diff saved to https://phabricator.wikimedia.org/P44183 and previous config saved to /var/cache/conftool/dbconfig/20230210-115913-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44182 and previous config saved to /var/cache/conftool/dbconfig/20230210-115852-marostegui.json
  • 11:58 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:56 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 11:54 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:53 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:51 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44181 and previous config saved to /var/cache/conftool/dbconfig/20230210-114447-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P44180 and previous config saved to /var/cache/conftool/dbconfig/20230210-114346-marostegui.json
  • 11:34 volans: uploaded spicerack_6.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44179 and previous config saved to /var/cache/conftool/dbconfig/20230210-112940-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P44178 and previous config saved to /var/cache/conftool/dbconfig/20230210-112840-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44177 and previous config saved to /var/cache/conftool/dbconfig/20230210-111434-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44176 and previous config saved to /var/cache/conftool/dbconfig/20230210-111333-marostegui.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44175 and previous config saved to /var/cache/conftool/dbconfig/20230210-111124-marostegui.json
  • 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44174 and previous config saved to /var/cache/conftool/dbconfig/20230210-111103-marostegui.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44173 and previous config saved to /var/cache/conftool/dbconfig/20230210-110740-marostegui.json
  • 11:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44172 and previous config saved to /var/cache/conftool/dbconfig/20230210-110715-marostegui.json
  • 11:05 moritzm: upgrade puppetdb[12]003 to bookworm T321783
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P44171 and previous config saved to /var/cache/conftool/dbconfig/20230210-105557-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P44170 and previous config saved to /var/cache/conftool/dbconfig/20230210-105208-marostegui.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P44169 and previous config saved to /var/cache/conftool/dbconfig/20230210-104051-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P44168 and previous config saved to /var/cache/conftool/dbconfig/20230210-103702-marostegui.json
  • 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34177
  • 10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34177
  • 10:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6677
  • 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6677
  • 10:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9145
  • 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9145
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install3001.wikimedia.org
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4764
  • 10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4764
  • 10:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138886
  • 10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138886
  • 10:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35467
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35467
  • 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44167 and previous config saved to /var/cache/conftool/dbconfig/20230210-102544-marostegui.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44166 and previous config saved to /var/cache/conftool/dbconfig/20230210-102156-marostegui.json
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install3001.wikimedia.org
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install4001.wikimedia.org
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44165 and previous config saved to /var/cache/conftool/dbconfig/20230210-102035-marostegui.json
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44164 and previous config saved to /var/cache/conftool/dbconfig/20230210-102014-marostegui.json
  • 10:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44163 and previous config saved to /var/cache/conftool/dbconfig/20230210-101600-marostegui.json
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44162 and previous config saved to /var/cache/conftool/dbconfig/20230210-101539-marostegui.json
  • 10:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install4001.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5001.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44161 and previous config saved to /var/cache/conftool/dbconfig/20230210-100508-marostegui.json
  • 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P44160 and previous config saved to /var/cache/conftool/dbconfig/20230210-100033-marostegui.json
  • 09:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5001.wikimedia.org
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install6001.wikimedia.org
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44159 and previous config saved to /var/cache/conftool/dbconfig/20230210-095001-marostegui.json
  • 09:45 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P44158 and previous config saved to /var/cache/conftool/dbconfig/20230210-094526-marostegui.json
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install6001.wikimedia.org
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44157 and previous config saved to /var/cache/conftool/dbconfig/20230210-093455-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44156 and previous config saved to /var/cache/conftool/dbconfig/20230210-093246-marostegui.json
  • 09:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44155 and previous config saved to /var/cache/conftool/dbconfig/20230210-093225-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44154 and previous config saved to /var/cache/conftool/dbconfig/20230210-093020-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44153 and previous config saved to /var/cache/conftool/dbconfig/20230210-092417-marostegui.json
  • 09:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44152 and previous config saved to /var/cache/conftool/dbconfig/20230210-092355-marostegui.json
  • 09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42184
  • 09:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42184
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44151 and previous config saved to /var/cache/conftool/dbconfig/20230210-091719-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44150 and previous config saved to /var/cache/conftool/dbconfig/20230210-090848-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44149 and previous config saved to /var/cache/conftool/dbconfig/20230210-090213-marostegui.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44148 and previous config saved to /var/cache/conftool/dbconfig/20230210-085342-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44147 and previous config saved to /var/cache/conftool/dbconfig/20230210-084706-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44146 and previous config saved to /var/cache/conftool/dbconfig/20230210-084457-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) T329181', diff saved to https://phabricator.wikimedia.org/P44145 and previous config saved to /var/cache/conftool/dbconfig/20230210-084452-root.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44144 and previous config saved to /var/cache/conftool/dbconfig/20230210-084430-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44143 and previous config saved to /var/cache/conftool/dbconfig/20230210-083836-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44142 and previous config saved to /var/cache/conftool/dbconfig/20230210-083140-marostegui.json
  • 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44141 and previous config saved to /var/cache/conftool/dbconfig/20230210-083119-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44140 and previous config saved to /var/cache/conftool/dbconfig/20230210-082923-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P44139 and previous config saved to /var/cache/conftool/dbconfig/20230210-081612-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44138 and previous config saved to /var/cache/conftool/dbconfig/20230210-081417-marostegui.json
  • 08:12 moritzm: installing virglrenderer security updates
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44137 and previous config saved to /var/cache/conftool/dbconfig/20230210-080841-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P44136 and previous config saved to /var/cache/conftool/dbconfig/20230210-080106-marostegui.json
  • 07:59 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44135 and previous config saved to /var/cache/conftool/dbconfig/20230210-075911-marostegui.json
  • 07:59 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44134 and previous config saved to /var/cache/conftool/dbconfig/20230210-075702-marostegui.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44133 and previous config saved to /var/cache/conftool/dbconfig/20230210-075336-root.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44132 and previous config saved to /var/cache/conftool/dbconfig/20230210-075314-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44131 and previous config saved to /var/cache/conftool/dbconfig/20230210-074600-marostegui.json
  • 07:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:41 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44130 and previous config saved to /var/cache/conftool/dbconfig/20230210-073902-marostegui.json
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44129 and previous config saved to /var/cache/conftool/dbconfig/20230210-073841-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44128 and previous config saved to /var/cache/conftool/dbconfig/20230210-073831-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44127 and previous config saved to /var/cache/conftool/dbconfig/20230210-073808-marostegui.json
  • 07:38 moritzm: installing wireshark security updates
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44126 and previous config saved to /var/cache/conftool/dbconfig/20230210-072335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44125 and previous config saved to /var/cache/conftool/dbconfig/20230210-072327-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44124 and previous config saved to /var/cache/conftool/dbconfig/20230210-072301-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44123 and previous config saved to /var/cache/conftool/dbconfig/20230210-070829-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44122 and previous config saved to /var/cache/conftool/dbconfig/20230210-070822-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44121 and previous config saved to /var/cache/conftool/dbconfig/20230210-070755-marostegui.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1098.eqiad.wmnet
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1098.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44120 and previous config saved to /var/cache/conftool/dbconfig/20230210-065322-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44119 and previous config saved to /var/cache/conftool/dbconfig/20230210-065317-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44118 and previous config saved to /var/cache/conftool/dbconfig/20230210-064728-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1098.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1098.eqiad.wmnet
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44117 and previous config saved to /var/cache/conftool/dbconfig/20230210-063812-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44116 and previous config saved to /var/cache/conftool/dbconfig/20230210-063543-marostegui.json
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44115 and previous config saved to /var/cache/conftool/dbconfig/20230210-063249-marostegui.json
  • 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes row D - pt1979@cumin2002"
  • 02:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes row D - pt1979@cumin2002"
  • 02:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:05 zabe: deployed mitigations for T326691
  • 02:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:49 Reedy: creating wbc_entity_usage on foundationwiki - T321967
  • 01:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in row C - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in row C - pt1979@cumin2002"
  • 01:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox

2023-02-09

  • 23:49 ladsgroup@deploy1002: Finished scap: Backport for Revert "Start reading from rev_comment_id in cebwiki" (duration: 09m 04s)
  • 23:41 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Start reading from rev_comment_id in cebwiki" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:39 ladsgroup@deploy1002: Started scap: Backport for Revert "Start reading from rev_comment_id in cebwiki"
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44114 and previous config saved to /var/cache/conftool/dbconfig/20230209-231522-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44113 and previous config saved to /var/cache/conftool/dbconfig/20230209-230016-ladsgroup.json
  • 22:50 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in cebwiki (T275246) (duration: 08m 26s)
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44112 and previous config saved to /var/cache/conftool/dbconfig/20230209-224509-ladsgroup.json
  • 22:44 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in cebwiki (T275246) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:42 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in cebwiki (T275246)
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44111 and previous config saved to /var/cache/conftool/dbconfig/20230209-223003-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44110 and previous config saved to /var/cache/conftool/dbconfig/20230209-222137-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44109 and previous config saved to /var/cache/conftool/dbconfig/20230209-222126-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44108 and previous config saved to /var/cache/conftool/dbconfig/20230209-220620-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44107 and previous config saved to /var/cache/conftool/dbconfig/20230209-215114-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44106 and previous config saved to /var/cache/conftool/dbconfig/20230209-213607-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44105 and previous config saved to /var/cache/conftool/dbconfig/20230209-212747-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44104 and previous config saved to /var/cache/conftool/dbconfig/20230209-212732-ladsgroup.json
  • 21:19 thcipriani@deploy1002: Sync cancelled.
  • 21:18 thcipriani@deploy1002: trainbranchbot and thcipriani: Backport for Revert "InitialiseSettings: install PageAssessments on newiki" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:17 thcipriani@deploy1002: Started scap: Backport for Revert "InitialiseSettings: install PageAssessments on newiki"
  • 21:13 thcipriani@deploy1002: backport aborted: (duration: 06m 05s)
  • 21:13 thcipriani@deploy1002: sync-world aborted: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) (duration: 04m 24s)
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44103 and previous config saved to /var/cache/conftool/dbconfig/20230209-211226-ladsgroup.json
  • 21:11 thcipriani@deploy1002: musikanimal and thcipriani: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:09 thcipriani@deploy1002: Started scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224)
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44102 and previous config saved to /var/cache/conftool/dbconfig/20230209-205720-ladsgroup.json
  • 20:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:47 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44101 and previous config saved to /var/cache/conftool/dbconfig/20230209-204214-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44100 and previous config saved to /var/cache/conftool/dbconfig/20230209-203236-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 20:31 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:27 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44099 and previous config saved to /var/cache/conftool/dbconfig/20230209-202551-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44098 and previous config saved to /var/cache/conftool/dbconfig/20230209-201045-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44097 and previous config saved to /var/cache/conftool/dbconfig/20230209-195539-ladsgroup.json
  • 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44096 and previous config saved to /var/cache/conftool/dbconfig/20230209-194730-marostegui.json
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44095 and previous config saved to /var/cache/conftool/dbconfig/20230209-194032-ladsgroup.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P44094 and previous config saved to /var/cache/conftool/dbconfig/20230209-193223-marostegui.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44093 and previous config saved to /var/cache/conftool/dbconfig/20230209-193107-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44092 and previous config saved to /var/cache/conftool/dbconfig/20230209-193057-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P44091 and previous config saved to /var/cache/conftool/dbconfig/20230209-191717-marostegui.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44090 and previous config saved to /var/cache/conftool/dbconfig/20230209-191551-ladsgroup.json
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44089 and previous config saved to /var/cache/conftool/dbconfig/20230209-190211-marostegui.json
  • 19:01 ebernhardson: start full-cluster in-place reindexing of all wiki elasticsearch clusters T147505
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44088 and previous config saved to /var/cache/conftool/dbconfig/20230209-190044-ladsgroup.json
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44087 and previous config saved to /var/cache/conftool/dbconfig/20230209-185933-marostegui.json
  • 18:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44086 and previous config saved to /var/cache/conftool/dbconfig/20230209-185912-marostegui.json
  • 18:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2001.codfw.wmnet with OS bullseye
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS buster
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44085 and previous config saved to /var/cache/conftool/dbconfig/20230209-184538-ladsgroup.json
  • 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P44084 and previous config saved to /var/cache/conftool/dbconfig/20230209-184405-marostegui.json
  • 18:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 18:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44083 and previous config saved to /var/cache/conftool/dbconfig/20230209-183611-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2431.codfw.wmnet with OS buster
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2433.codfw.wmnet with OS buster
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 18:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P44082 and previous config saved to /var/cache/conftool/dbconfig/20230209-182859-marostegui.json
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bullseye
  • 18:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2433.codfw.wmnet with reason: host reimage
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44081 and previous config saved to /var/cache/conftool/dbconfig/20230209-181353-marostegui.json
  • 18:12 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:11 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44080 and previous config saved to /var/cache/conftool/dbconfig/20230209-181115-marostegui.json
  • 18:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2433.codfw.wmnet with reason: host reimage
  • 18:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44079 and previous config saved to /var/cache/conftool/dbconfig/20230209-181043-marostegui.json
  • 18:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:09 jiji@cumin1001: Updating IPMI password on 1 hosts - jiji@cumin1001
  • 18:09 jiji@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS buster
  • 18:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 18:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 18:03 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:02 jiji@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 18:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:01 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:00 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:00 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P44078 and previous config saved to /var/cache/conftool/dbconfig/20230209-175536-marostegui.json
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2433.codfw.wmnet with OS buster
  • 17:50 jiji@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 17:41 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P44077 and previous config saved to /var/cache/conftool/dbconfig/20230209-174030-marostegui.json
  • 17:32 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e84e692]: (no justification provided) (duration: 00m 16s)
  • 17:32 mforns@deploy1002: Started deploy [airflow-dags/analytics@e84e692]: (no justification provided)
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44076 and previous config saved to /var/cache/conftool/dbconfig/20230209-173045-ladsgroup.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44075 and previous config saved to /var/cache/conftool/dbconfig/20230209-172524-marostegui.json
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44074 and previous config saved to /var/cache/conftool/dbconfig/20230209-172239-marostegui.json
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44073 and previous config saved to /var/cache/conftool/dbconfig/20230209-172129-marostegui.json
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 17:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44072 and previous config saved to /var/cache/conftool/dbconfig/20230209-171846-marostegui.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44071 and previous config saved to /var/cache/conftool/dbconfig/20230209-171539-ladsgroup.json
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P44070 and previous config saved to /var/cache/conftool/dbconfig/20230209-170732-marostegui.json
  • 17:07 moritzm: rolling restart of FPM/Apache on mw canaries to pick up curl security updates
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P44069 and previous config saved to /var/cache/conftool/dbconfig/20230209-170340-marostegui.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44068 and previous config saved to /var/cache/conftool/dbconfig/20230209-170031-ladsgroup.json
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P44067 and previous config saved to /var/cache/conftool/dbconfig/20230209-165226-marostegui.json
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P44066 and previous config saved to /var/cache/conftool/dbconfig/20230209-164834-marostegui.json
  • 16:48 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1004.eqiad.wmnet with OS bullseye
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44065 and previous config saved to /var/cache/conftool/dbconfig/20230209-164525-ladsgroup.json
  • 16:44 moritzm: installing curl security updates on buster
  • 16:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS bullseye
  • 16:38 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@caf4808]: T329089: proper reconciliation of missed page-undelete events (duration: 02m 24s)
  • 16:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44064 and previous config saved to /var/cache/conftool/dbconfig/20230209-163720-marostegui.json
  • 16:36 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@caf4808]: T329089: proper reconciliation of missed page-undelete events
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44063 and previous config saved to /var/cache/conftool/dbconfig/20230209-163559-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44062 and previous config saved to /var/cache/conftool/dbconfig/20230209-163538-ladsgroup.json
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44061 and previous config saved to /var/cache/conftool/dbconfig/20230209-163459-marostegui.json
  • 16:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44060 and previous config saved to /var/cache/conftool/dbconfig/20230209-163438-marostegui.json
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44059 and previous config saved to /var/cache/conftool/dbconfig/20230209-163327-marostegui.json
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44058 and previous config saved to /var/cache/conftool/dbconfig/20230209-162927-marostegui.json
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44057 and previous config saved to /var/cache/conftool/dbconfig/20230209-162855-marostegui.json
  • 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2055.codfw.wmnet with reason: host reimage
  • 16:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2055.codfw.wmnet with reason: host reimage
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44056 and previous config saved to /var/cache/conftool/dbconfig/20230209-162032-ladsgroup.json
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P44055 and previous config saved to /var/cache/conftool/dbconfig/20230209-161931-marostegui.json
  • 16:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 16:14 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44054 and previous config saved to /var/cache/conftool/dbconfig/20230209-161349-marostegui.json
  • 16:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 16:09 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS bullseye
  • 16:09 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc2055.codfw.wmnet with OS bullseye
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44053 and previous config saved to /var/cache/conftool/dbconfig/20230209-160525-ladsgroup.json
  • 16:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2435.codfw.wmnet with OS buster
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P44052 and previous config saved to /var/cache/conftool/dbconfig/20230209-160425-marostegui.json
  • 16:02 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1004.eqiad.wmnet with OS bullseye
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44051 and previous config saved to /var/cache/conftool/dbconfig/20230209-155843-marostegui.json
  • 15:56 jiji@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:56 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 sukhe: restart esitest.service on A:cp-text
  • 15:55 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS bullseye
  • 15:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 15:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44050 and previous config saved to /var/cache/conftool/dbconfig/20230209-155019-ladsgroup.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44049 and previous config saved to /var/cache/conftool/dbconfig/20230209-154919-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44048 and previous config saved to /var/cache/conftool/dbconfig/20230209-154347-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44047 and previous config saved to /var/cache/conftool/dbconfig/20230209-154337-marostegui.json
  • 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44046 and previous config saved to /var/cache/conftool/dbconfig/20230209-154330-marostegui.json
  • 15:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2433.codfw.wmnet with OS buster
  • 15:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44045 and previous config saved to /var/cache/conftool/dbconfig/20230209-154058-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44044 and previous config saved to /var/cache/conftool/dbconfig/20230209-154032-ladsgroup.json
  • 15:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 15:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 15:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 15:34 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:34 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 15:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS buster
  • 15:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P44043 and previous config saved to /var/cache/conftool/dbconfig/20230209-152824-marostegui.json
  • 15:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2054.codfw.wmnet with reason: host reimage
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44042 and previous config saved to /var/cache/conftool/dbconfig/20230209-152525-ladsgroup.json
  • 15:24 jiji@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 15:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2054.codfw.wmnet with reason: host reimage
  • 15:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:16 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P44041 and previous config saved to /var/cache/conftool/dbconfig/20230209-151317-marostegui.json
  • 15:12 jiji@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44040 and previous config saved to /var/cache/conftool/dbconfig/20230209-151019-ladsgroup.json
  • 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS buster
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2432.codfw.wmnet with OS buster
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS bullseye
  • 15:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 15:04 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS bullseye
  • 14:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:58 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44039 and previous config saved to /var/cache/conftool/dbconfig/20230209-145811-marostegui.json
  • 14:57 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:56 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:55 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44038 and previous config saved to /var/cache/conftool/dbconfig/20230209-145513-ladsgroup.json
  • 14:52 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44037 and previous config saved to /var/cache/conftool/dbconfig/20230209-145232-marostegui.json
  • 14:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44036 and previous config saved to /var/cache/conftool/dbconfig/20230209-145210-marostegui.json
  • 14:52 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:51 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:51 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:49 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS buster
  • 14:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2053.codfw.wmnet with reason: host reimage
  • 14:46 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@dc3cd56]: T329089: proper reconciliation of missed page-undelete events (duration: 20m 48s)
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2433.codfw.wmnet with OS buster
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44035 and previous config saved to /var/cache/conftool/dbconfig/20230209-144535-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:45 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2430.codfw.wmnet with OS buster
  • 14:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2429.codfw.wmnet with OS buster
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:44 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 14:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2053.codfw.wmnet with reason: host reimage
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44034 and previous config saved to /var/cache/conftool/dbconfig/20230209-144321-marostegui.json
  • 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44033 and previous config saved to /var/cache/conftool/dbconfig/20230209-144300-marostegui.json
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 14:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44032 and previous config saved to /var/cache/conftool/dbconfig/20230209-143828-ladsgroup.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P44031 and previous config saved to /var/cache/conftool/dbconfig/20230209-143704-marostegui.json
  • 14:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44030 and previous config saved to /var/cache/conftool/dbconfig/20230209-142754-marostegui.json
  • 14:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS bullseye
  • 14:25 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@dc3cd56]: T329089: proper reconciliation of missed page-undelete events
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44029 and previous config saved to /var/cache/conftool/dbconfig/20230209-142321-ladsgroup.json
  • 14:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2430.codfw.wmnet with reason: host reimage
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P44028 and previous config saved to /var/cache/conftool/dbconfig/20230209-142157-marostegui.json
  • 14:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2430.codfw.wmnet with reason: host reimage
  • 14:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
  • 14:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 14:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
  • 14:14 dcausse: T329089: re-playing detected inconsistencies (missing mediawiki.page-undelete events) from 2022-10-31 to 2023-02-07 to WDQS
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44027 and previous config saved to /var/cache/conftool/dbconfig/20230209-141247-marostegui.json
  • 14:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-1-unpatched.out # T328634 – finished successfully, to my surprise
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44026 and previous config saved to /var/cache/conftool/dbconfig/20230209-140815-ladsgroup.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44025 and previous config saved to /var/cache/conftool/dbconfig/20230209-140650-marostegui.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44024 and previous config saved to /var/cache/conftool/dbconfig/20230209-140124-marostegui.json
  • 14:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44023 and previous config saved to /var/cache/conftool/dbconfig/20230209-140059-marostegui.json
  • 13:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2430.codfw.wmnet with OS buster
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44022 and previous config saved to /var/cache/conftool/dbconfig/20230209-135741-marostegui.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44021 and previous config saved to /var/cache/conftool/dbconfig/20230209-135441-marostegui.json
  • 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2429.codfw.wmnet with OS buster
  • 13:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44020 and previous config saved to /var/cache/conftool/dbconfig/20230209-135420-marostegui.json
  • 13:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@fbebd61]: Update analytics actor dags spark resources (duration: 00m 13s)
  • 13:53 joal@deploy1002: Started deploy [airflow-dags/analytics@fbebd61]: Update analytics actor dags spark resources
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44019 and previous config saved to /var/cache/conftool/dbconfig/20230209-135309-ladsgroup.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P44018 and previous config saved to /var/cache/conftool/dbconfig/20230209-134553-marostegui.json
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44017 and previous config saved to /var/cache/conftool/dbconfig/20230209-134343-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44016 and previous config saved to /var/cache/conftool/dbconfig/20230209-134322-ladsgroup.json
  • 13:40 elukey: restart prometheus-statsd-exporter on ores nodes to pick up label change - T325763
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44014 and previous config saved to /var/cache/conftool/dbconfig/20230209-133914-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P44013 and previous config saved to /var/cache/conftool/dbconfig/20230209-133046-marostegui.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44012 and previous config saved to /var/cache/conftool/dbconfig/20230209-132815-ladsgroup.json
  • 13:27 hashar: phab2002: manually stopped `phd` service. It can't start due to the MariaDB server being set read-only and failed to start every 10 seconds since forever
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44011 and previous config saved to /var/cache/conftool/dbconfig/20230209-132407-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44010 and previous config saved to /var/cache/conftool/dbconfig/20230209-131540-marostegui.json
  • 13:15 moritzm: restarting Exim on MXes to pick up OpenSSL update
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44009 and previous config saved to /var/cache/conftool/dbconfig/20230209-131309-ladsgroup.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44008 and previous config saved to /var/cache/conftool/dbconfig/20230209-131010-marostegui.json
  • 13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44007 and previous config saved to /var/cache/conftool/dbconfig/20230209-130901-marostegui.json
  • 13:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P44006 and previous config saved to /var/cache/conftool/dbconfig/20230209-130555-marostegui.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44005 and previous config saved to /var/cache/conftool/dbconfig/20230209-130504-marostegui.json
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44004 and previous config saved to /var/cache/conftool/dbconfig/20230209-130442-marostegui.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44003 and previous config saved to /var/cache/conftool/dbconfig/20230209-125803-ladsgroup.json
  • 12:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS buster
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P44002 and previous config saved to /var/cache/conftool/dbconfig/20230209-125048-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44001 and previous config saved to /var/cache/conftool/dbconfig/20230209-124936-marostegui.json
  • 12:49 joal@deploy1002: Finished deploy [airflow-dags/analytics@cf9d978]: Fix analytics pageview_actor_hourly (duration: 00m 13s)
  • 12:48 joal@deploy1002: Started deploy [airflow-dags/analytics@cf9d978]: Fix analytics pageview_actor_hourly
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44000 and previous config saved to /var/cache/conftool/dbconfig/20230209-124837-ladsgroup.json
  • 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bullseye
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P43999 and previous config saved to /var/cache/conftool/dbconfig/20230209-123542-marostegui.json
  • 12:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43998 and previous config saved to /var/cache/conftool/dbconfig/20230209-123430-marostegui.json
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 12:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 12:22 phedenskog@deploy1002: Finished deploy [performance/navtiming@bb224a1]: (no justification provided) (duration: 00m 08s)
  • 12:22 phedenskog@deploy1002: Started deploy [performance/navtiming@bb224a1]: (no justification provided)
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P43997 and previous config saved to /var/cache/conftool/dbconfig/20230209-122036-marostegui.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P43996 and previous config saved to /var/cache/conftool/dbconfig/20230209-121923-marostegui.json
  • 12:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS buster
  • 12:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move Babel settings from IS.php to ext-Babel.php, part III (T308932) (duration: 06m 47s)
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P43995 and previous config saved to /var/cache/conftool/dbconfig/20230209-121705-marostegui.json
  • 12:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43994 and previous config saved to /var/cache/conftool/dbconfig/20230209-121644-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P43993 and previous config saved to /var/cache/conftool/dbconfig/20230209-121507-marostegui.json
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43992 and previous config saved to /var/cache/conftool/dbconfig/20230209-121446-marostegui.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bullseye
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb1003.eqiad.wmnet with OS bullseye
  • 12:10 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move Babel settings from IS.php to ext-Babel.php, part II (T308932) (duration: 06m 40s)
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:06 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:03 ladsgroup@deploy1002: Synchronized wmf-config/ext-Babel.php: Move Babel settings from IS.php to ext-Babel.php, part I (T308932) (duration: 07m 06s)
  • 12:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43991 and previous config saved to /var/cache/conftool/dbconfig/20230209-120138-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43990 and previous config saved to /var/cache/conftool/dbconfig/20230209-115940-marostegui.json
  • 11:57 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:57 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 11:52 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:52 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43989 and previous config saved to /var/cache/conftool/dbconfig/20230209-114632-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43988 and previous config saved to /var/cache/conftool/dbconfig/20230209-114434-marostegui.json
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bullseye
  • 11:34 marostegui: Stop mariadb on db1098 (s6 and s7) T329171
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43986 and previous config saved to /var/cache/conftool/dbconfig/20230209-113125-marostegui.json
  • 11:31 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1003.eqiad.wmnet with OS bullseye
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43985 and previous config saved to /var/cache/conftool/dbconfig/20230209-112927-marostegui.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43984 and previous config saved to /var/cache/conftool/dbconfig/20230209-112748-marostegui.json
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43983 and previous config saved to /var/cache/conftool/dbconfig/20230209-112727-marostegui.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43982 and previous config saved to /var/cache/conftool/dbconfig/20230209-112359-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43981 and previous config saved to /var/cache/conftool/dbconfig/20230209-112338-marostegui.json
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43980 and previous config saved to /var/cache/conftool/dbconfig/20230209-111220-marostegui.json
  • 11:10 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS bullseye
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43979 and previous config saved to /var/cache/conftool/dbconfig/20230209-110832-marostegui.json
  • 11:07 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 11:02 effie: powercycle mc-gp1001
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged
  • 10:58 joal@deploy1002: Finished deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor (duration: 00m 13s)
  • 10:58 joal@deploy1002: Started deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43978 and previous config saved to /var/cache/conftool/dbconfig/20230209-105714-marostegui.json
  • 10:55 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1003.eqiad.wmnet with OS bullseye
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43977 and previous config saved to /var/cache/conftool/dbconfig/20230209-105325-marostegui.json
  • 10:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage
  • 10:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43976 and previous config saved to /var/cache/conftool/dbconfig/20230209-104218-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43975 and previous config saved to /var/cache/conftool/dbconfig/20230209-104214-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43974 and previous config saved to /var/cache/conftool/dbconfig/20230209-104208-marostegui.json
  • 10:38 moritzm: installing containerd security updates
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43973 and previous config saved to /var/cache/conftool/dbconfig/20230209-103819-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43972 and previous config saved to /var/cache/conftool/dbconfig/20230209-103733-marostegui.json
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43971 and previous config saved to /var/cache/conftool/dbconfig/20230209-103712-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43970 and previous config saved to /var/cache/conftool/dbconfig/20230209-103604-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS bullseye
  • 10:31 joal@deploy1002: Finished deploy [airflow-dags/analytics@2ab6564]: Analytics deploy for 3 druid jobs and webrequest_actor jobs (duration: 00m 17s)
  • 10:31 joal@deploy1002: Started deploy [airflow-dags/analytics@2ab6564]: Analytics deploy for 3 druid jobs and webrequest_actor jobs
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43968 and previous config saved to /var/cache/conftool/dbconfig/20230209-102713-root.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43967 and previous config saved to /var/cache/conftool/dbconfig/20230209-102709-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P43966 and previous config saved to /var/cache/conftool/dbconfig/20230209-102206-marostegui.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43965 and previous config saved to /var/cache/conftool/dbconfig/20230209-101209-root.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43964 and previous config saved to /var/cache/conftool/dbconfig/20230209-101204-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P43963 and previous config saved to /var/cache/conftool/dbconfig/20230209-100700-marostegui.json
  • 10:01 kostajh: UTC morning deploys really done
  • 09:59 kharlan@deploy1002: Finished scap: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945) (duration: 09m 06s)
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43962 and previous config saved to /var/cache/conftool/dbconfig/20230209-095704-root.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43961 and previous config saved to /var/cache/conftool/dbconfig/20230209-095659-root.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43960 and previous config saved to /var/cache/conftool/dbconfig/20230209-095153-marostegui.json
  • 09:51 kharlan@deploy1002: kharlan: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:50 kharlan@deploy1002: Started scap: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945)
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43959 and previous config saved to /var/cache/conftool/dbconfig/20230209-094816-marostegui.json
  • 09:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43958 and previous config saved to /var/cache/conftool/dbconfig/20230209-094755-marostegui.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43957 and previous config saved to /var/cache/conftool/dbconfig/20230209-094159-root.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43956 and previous config saved to /var/cache/conftool/dbconfig/20230209-094154-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P43955 and previous config saved to /var/cache/conftool/dbconfig/20230209-093248-marostegui.json
  • 09:32 godog: roll-restart opensearch-dashboards to apply memory limit - T327161
  • 09:29 kharlan@deploy1002: Finished scap: Backport for Add StatusValue::hasMessagesExcept() (T272081) (duration: 09m 20s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43954 and previous config saved to /var/cache/conftool/dbconfig/20230209-092654-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43953 and previous config saved to /var/cache/conftool/dbconfig/20230209-092650-root.json
  • 09:22 kharlan@deploy1002: kharlan: Backport for Add StatusValue::hasMessagesExcept() (T272081) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:20 kharlan@deploy1002: Started scap: Backport for Add StatusValue::hasMessagesExcept() (T272081)
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P43952 and previous config saved to /var/cache/conftool/dbconfig/20230209-091742-marostegui.json
  • 09:13 moritzm: installing openssl security updates on Bullseye
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43951 and previous config saved to /var/cache/conftool/dbconfig/20230209-091149-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43950 and previous config saved to /var/cache/conftool/dbconfig/20230209-091145-root.json
  • 09:10 marostegui: Install 10.4.28 on db1107 T329011
  • 09:10 marostegui: Install 10.6.12 on db1132 T329011
  • 09:09 vgutierrez: pool cp4044 with ESI testing enabled
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 db1132', diff saved to https://phabricator.wikimedia.org/P43949 and previous config saved to /var/cache/conftool/dbconfig/20230209-090846-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43948 and previous config saved to /var/cache/conftool/dbconfig/20230209-090236-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43947 and previous config saved to /var/cache/conftool/dbconfig/20230209-090018-marostegui.json
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43946 and previous config saved to /var/cache/conftool/dbconfig/20230209-085952-marostegui.json
  • 08:57 vgutierrez: depool cp4044 - T308799
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P43945 and previous config saved to /var/cache/conftool/dbconfig/20230209-084446-marostegui.json
  • 08:42 apergos: UTC morning backport and config training window complete
  • 08:32 kartik@deploy1002: Finished scap: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154) (duration: 13m 10s)
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P43944 and previous config saved to /var/cache/conftool/dbconfig/20230209-082940-marostegui.json
  • 08:21 kartik@deploy1002: kartik: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:19 kartik@deploy1002: Started scap: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154)
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43943 and previous config saved to /var/cache/conftool/dbconfig/20230209-081433-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43942 and previous config saved to /var/cache/conftool/dbconfig/20230209-081116-marostegui.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43941 and previous config saved to /var/cache/conftool/dbconfig/20230209-081054-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P43940 and previous config saved to /var/cache/conftool/dbconfig/20230209-075548-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P43939 and previous config saved to /var/cache/conftool/dbconfig/20230209-074042-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43938 and previous config saved to /var/cache/conftool/dbconfig/20230209-072535-marostegui.json
  • 07:24 marostegui: Stop mariadb on db1117:3321 (some dbproxy irc alerts will be triggered) T329143
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43936 and previous config saved to /var/cache/conftool/dbconfig/20230209-072204-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1098 (s6, s7) from dbctl T329171', diff saved to https://phabricator.wikimedia.org/P43935 and previous config saved to /var/cache/conftool/dbconfig/20230209-071013-marostegui.json
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:00 marostegui: Failover m3 from db1164 to db1159 - T329141
  • 06:48 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in eqiad: maintenance
  • 06:48 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in eqiad: maintenance
  • 03:34 eileen: civicrm upgraded from 07ef73b8 to efa4c485
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2428.codfw.wmnet with OS buster
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43934 and previous config saved to /var/cache/conftool/dbconfig/20230209-024920-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43933 and previous config saved to /var/cache/conftool/dbconfig/20230209-023413-ladsgroup.json
  • 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 02:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 02:28 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/PageAssessments/maintenance/purgeUnusedProjects.php --wiki zhwiki` for T326387
  • 02:23 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/PageAssessments/maintenance/purgeUnusedProjects.php --wiki zhwiki --dry-run` for T326387
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43931 and previous config saved to /var/cache/conftool/dbconfig/20230209-021907-ladsgroup.json
  • 02:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2428.codfw.wmnet with OS buster
  • 02:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2427.codfw.wmnet with OS buster
  • 02:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2426.codfw.wmnet with OS buster
  • 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43930 and previous config saved to /var/cache/conftool/dbconfig/20230209-020401-ladsgroup.json
  • 02:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2427.codfw.wmnet with reason: host reimage
  • 01:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2427.codfw.wmnet with reason: host reimage
  • 01:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 01:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 01:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2427.codfw.wmnet with OS buster
  • 01:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2426.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2425.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43929 and previous config saved to /var/cache/conftool/dbconfig/20230209-010450-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43928 and previous config saved to /var/cache/conftool/dbconfig/20230209-010429-ladsgroup.json
  • 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 01:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43927 and previous config saved to /var/cache/conftool/dbconfig/20230209-010132-marostegui.json
  • 01:01 eileen: civicrm upgraded from b5d6a790 to 07ef73b8
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 00:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 00:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43926 and previous config saved to /var/cache/conftool/dbconfig/20230209-004923-ladsgroup.json
  • 00:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P43925 and previous config saved to /var/cache/conftool/dbconfig/20230209-004625-marostegui.json
  • 00:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2425.codfw.wmnet with OS buster
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43924 and previous config saved to /var/cache/conftool/dbconfig/20230209-003416-ladsgroup.json
  • 00:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P43923 and previous config saved to /var/cache/conftool/dbconfig/20230209-003119-marostegui.json
  • 00:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43922 and previous config saved to /var/cache/conftool/dbconfig/20230209-001910-ladsgroup.json
  • 00:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43921 and previous config saved to /var/cache/conftool/dbconfig/20230209-001613-marostegui.json
  • 00:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43920 and previous config saved to /var/cache/conftool/dbconfig/20230209-001401-marostegui.json
  • 00:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 00:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43919 and previous config saved to /var/cache/conftool/dbconfig/20230209-001340-marostegui.json

2023-02-08

  • 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P43918 and previous config saved to /var/cache/conftool/dbconfig/20230208-235833-marostegui.json
  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43917 and previous config saved to /var/cache/conftool/dbconfig/20230208-235157-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43916 and previous config saved to /var/cache/conftool/dbconfig/20230208-235109-ladsgroup.json
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P43915 and previous config saved to /var/cache/conftool/dbconfig/20230208-234327-marostegui.json
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43913 and previous config saved to /var/cache/conftool/dbconfig/20230208-233603-ladsgroup.json
  • 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43912 and previous config saved to /var/cache/conftool/dbconfig/20230208-232821-marostegui.json
  • 23:27 urbanecm@deploy1002: Finished scap: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047) (duration: 08m 32s)
  • 23:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43911 and previous config saved to /var/cache/conftool/dbconfig/20230208-232608-marostegui.json
  • 23:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43910 and previous config saved to /var/cache/conftool/dbconfig/20230208-232547-marostegui.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43909 and previous config saved to /var/cache/conftool/dbconfig/20230208-232056-ladsgroup.json
  • 23:20 urbanecm@deploy1002: superpes and urbanecm: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:18 urbanecm@deploy1002: Started scap: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047)
  • 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P43908 and previous config saved to /var/cache/conftool/dbconfig/20230208-231041-marostegui.json
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43907 and previous config saved to /var/cache/conftool/dbconfig/20230208-230550-ladsgroup.json
  • 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P43906 and previous config saved to /var/cache/conftool/dbconfig/20230208-225534-marostegui.json
  • 22:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43905 and previous config saved to /var/cache/conftool/dbconfig/20230208-224028-marostegui.json
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43904 and previous config saved to /var/cache/conftool/dbconfig/20230208-223430-marostegui.json
  • 22:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43903 and previous config saved to /var/cache/conftool/dbconfig/20230208-223408-marostegui.json
  • 22:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 22:29 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21 refs T325585 (duration: 06m 29s)
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS buster
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:23 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325585
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P43902 and previous config saved to /var/cache/conftool/dbconfig/20230208-221902-marostegui.json
  • 22:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS buster
  • 22:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43901 and previous config saved to /var/cache/conftool/dbconfig/20230208-220532-ladsgroup.json
  • 22:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P43900 and previous config saved to /var/cache/conftool/dbconfig/20230208-220356-marostegui.json
  • 22:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2421.codfw.wmnet with OS buster
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 21:56 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43899 and previous config saved to /var/cache/conftool/dbconfig/20230208-214849-marostegui.json
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43898 and previous config saved to /var/cache/conftool/dbconfig/20230208-214343-marostegui.json
  • 21:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43897 and previous config saved to /var/cache/conftool/dbconfig/20230208-214322-marostegui.json
  • 21:41 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2422
  • 21:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 21:40 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2422
  • 21:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 21:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P43896 and previous config saved to /var/cache/conftool/dbconfig/20230208-212815-marostegui.json
  • 21:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS buster
  • 21:20 urbanecm@deploy1002: Finished scap: Backport for guwwikiquote: Add custom logo (T321247) (duration: 08m 10s)
  • 21:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2421.codfw.wmnet with OS buster
  • 21:13 urbanecm@deploy1002: urbanecm: Backport for guwwikiquote: Add custom logo (T321247) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P43895 and previous config saved to /var/cache/conftool/dbconfig/20230208-211309-marostegui.json
  • 21:11 urbanecm@deploy1002: Started scap: Backport for guwwikiquote: Add custom logo (T321247)
  • 21:11 urbanecm@deploy1002: Finished scap: Backport for [logos] Make logos/manage.py work again, [logos] Regenerate logos.php (duration: 07m 42s)
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43894 and previous config saved to /var/cache/conftool/dbconfig/20230208-210807-ladsgroup.json
  • 21:03 urbanecm@deploy1002: Started scap: Backport for [logos] Make logos/manage.py work again, [logos] Regenerate logos.php
  • 20:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43893 and previous config saved to /var/cache/conftool/dbconfig/20230208-205803-marostegui.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43892 and previous config saved to /var/cache/conftool/dbconfig/20230208-205301-ladsgroup.json
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43891 and previous config saved to /var/cache/conftool/dbconfig/20230208-205211-marostegui.json
  • 20:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43890 and previous config saved to /var/cache/conftool/dbconfig/20230208-205133-marostegui.json
  • 20:48 rzl: enabled puppet on C:profile::mediawiki::webserver - T306015
  • 20:43 rzl: disabling puppet on C:profile::mediawiki::webserver to merge and test 887434 - T306015
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43889 and previous config saved to /var/cache/conftool/dbconfig/20230208-203755-ladsgroup.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P43888 and previous config saved to /var/cache/conftool/dbconfig/20230208-203627-marostegui.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43887 and previous config saved to /var/cache/conftool/dbconfig/20230208-202249-ladsgroup.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P43886 and previous config saved to /var/cache/conftool/dbconfig/20230208-202120-marostegui.json
  • 20:17 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.22 refs T325585 (duration: 06m 33s)
  • 20:11 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.22 refs T325585
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43885 and previous config saved to /var/cache/conftool/dbconfig/20230208-200614-marostegui.json
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2420.codfw.wmnet with OS buster
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43884 and previous config saved to /var/cache/conftool/dbconfig/20230208-200006-marostegui.json
  • 20:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43883 and previous config saved to /var/cache/conftool/dbconfig/20230208-195542-marostegui.json
  • 19:52 dancy@deploy1002: say aborted: (duration: 00m 01s)
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43882 and previous config saved to /var/cache/conftool/dbconfig/20230208-194745-marostegui.json
  • 19:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 19:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P43881 and previous config saved to /var/cache/conftool/dbconfig/20230208-194036-marostegui.json
  • 19:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P43880 and previous config saved to /var/cache/conftool/dbconfig/20230208-193239-marostegui.json
  • 19:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P43879 and previous config saved to /var/cache/conftool/dbconfig/20230208-192530-marostegui.json
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43878 and previous config saved to /var/cache/conftool/dbconfig/20230208-192136-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43877 and previous config saved to /var/cache/conftool/dbconfig/20230208-192115-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P43876 and previous config saved to /var/cache/conftool/dbconfig/20230208-191732-marostegui.json
  • 19:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43875 and previous config saved to /var/cache/conftool/dbconfig/20230208-191023-marostegui.json
  • 19:08 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9101b03] (duration: 01m 20s)
  • 19:06 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9101b03]
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43874 and previous config saved to /var/cache/conftool/dbconfig/20230208-190608-ladsgroup.json
  • 19:06 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03] (thin): Regular analytics weekly train THIN [analytics/refinery@9101b03] (duration: 00m 07s)
  • 19:05 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03] (thin): Regular analytics weekly train THIN [analytics/refinery@9101b03]
  • 19:05 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03]: Regular analytics weekly train [analytics/refinery@9101b03] (duration: 06m 28s)
  • 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43873 and previous config saved to /var/cache/conftool/dbconfig/20230208-190410-marostegui.json
  • 19:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43872 and previous config saved to /var/cache/conftool/dbconfig/20230208-190349-marostegui.json
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43871 and previous config saved to /var/cache/conftool/dbconfig/20230208-190226-marostegui.json
  • 18:59 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03]: Regular analytics weekly train [analytics/refinery@9101b03]
  • 18:57 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2420.codfw.wmnet with OS buster
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43870 and previous config saved to /var/cache/conftool/dbconfig/20230208-185513-marostegui.json
  • 18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43869 and previous config saved to /var/cache/conftool/dbconfig/20230208-185451-marostegui.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43868 and previous config saved to /var/cache/conftool/dbconfig/20230208-185102-ladsgroup.json
  • 18:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P43867 and previous config saved to /var/cache/conftool/dbconfig/20230208-184842-marostegui.json
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P43866 and previous config saved to /var/cache/conftool/dbconfig/20230208-183945-marostegui.json
  • 18:39 mutante: adding az.wikimedia.org to DNS - approved by affcom T306015
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43865 and previous config saved to /var/cache/conftool/dbconfig/20230208-183556-ladsgroup.json
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P43864 and previous config saved to /var/cache/conftool/dbconfig/20230208-183336-marostegui.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P43863 and previous config saved to /var/cache/conftool/dbconfig/20230208-182439-marostegui.json
  • 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43862 and previous config saved to /var/cache/conftool/dbconfig/20230208-181829-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43861 and previous config saved to /var/cache/conftool/dbconfig/20230208-181222-marostegui.json
  • 18:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 18:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43860 and previous config saved to /var/cache/conftool/dbconfig/20230208-181200-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43859 and previous config saved to /var/cache/conftool/dbconfig/20230208-180933-marostegui.json
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43858 and previous config saved to /var/cache/conftool/dbconfig/20230208-180216-marostegui.json
  • 18:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43857 and previous config saved to /var/cache/conftool/dbconfig/20230208-180153-marostegui.json
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P43856 and previous config saved to /var/cache/conftool/dbconfig/20230208-175654-marostegui.json
  • 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P43855 and previous config saved to /var/cache/conftool/dbconfig/20230208-174647-marostegui.json
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P43854 and previous config saved to /var/cache/conftool/dbconfig/20230208-174148-marostegui.json
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43850 and previous config saved to /var/cache/conftool/dbconfig/20230208-172021-marostegui.json
  • 17:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43849 and previous config saved to /var/cache/conftool/dbconfig/20230208-171634-marostegui.json
  • 17:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 17:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 17:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on rpki2002.codfw.wmnet with reason: Restarting to increase VM RAM allocation
  • 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on rpki2002.codfw.wmnet with reason: Restarting to increase VM RAM allocation
  • 17:11 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in codfw: maintenance
  • 17:11 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in codfw: maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43848 and previous config saved to /var/cache/conftool/dbconfig/20230208-171028-marostegui.json
  • 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43847 and previous config saved to /var/cache/conftool/dbconfig/20230208-171006-marostegui.json
  • 17:09 jynus: disable bacula job backup1002.eqiad.wmnet-Weekly-Thu-EsRwCodfw-mysql-srv-backups-dumps-latest
  • 17:08 oblivian@cumin2002: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) status all services in codfw: maintenance
  • 17:08 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in codfw: maintenance
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P43844 and previous config saved to /var/cache/conftool/dbconfig/20230208-165500-marostegui.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:44 sukhe: [done] rolling restart of haproxy and trafficserver in A:cp
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P43842 and previous config saved to /var/cache/conftool/dbconfig/20230208-163954-marostegui.json
  • 16:27 sukhe: rolling restart of haproxy and trafficserver in A:cp
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43841 and previous config saved to /var/cache/conftool/dbconfig/20230208-162447-marostegui.json
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43840 and previous config saved to /var/cache/conftool/dbconfig/20230208-161828-marostegui.json
  • 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43839 and previous config saved to /var/cache/conftool/dbconfig/20230208-161807-marostegui.json
  • 16:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on rpki1001.eqiad.wmnet with reason: Restarting to increase VM RAM allocation
  • 16:14 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on rpki1001.eqiad.wmnet with reason: Restarting to increase VM RAM allocation
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P43838 and previous config saved to /var/cache/conftool/dbconfig/20230208-160301-marostegui.json
  • 15:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2420.codfw.wmnet with OS buster
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P43837 and previous config saved to /var/cache/conftool/dbconfig/20230208-154754-marostegui.json
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43836 and previous config saved to /var/cache/conftool/dbconfig/20230208-154420-marostegui.json
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new test VM in drmrs - jmm@cumin2002"
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43835 and previous config saved to /var/cache/conftool/dbconfig/20230208-153248-marostegui.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43834 and previous config saved to /var/cache/conftool/dbconfig/20230208-153022-marostegui.json
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43833 and previous config saved to /var/cache/conftool/dbconfig/20230208-152956-marostegui.json
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P43832 and previous config saved to /var/cache/conftool/dbconfig/20230208-152913-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P43831 and previous config saved to /var/cache/conftool/dbconfig/20230208-151450-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P43830 and previous config saved to /var/cache/conftool/dbconfig/20230208-151407-marostegui.json
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:10 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 15:00 jforrester@deploy1002: Finished scap: Backport for Add a wordmark to itwiktionary (T329168) (duration: 08m 26s)
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P43829 and previous config saved to /var/cache/conftool/dbconfig/20230208-145944-marostegui.json
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43828 and previous config saved to /var/cache/conftool/dbconfig/20230208-145901-marostegui.json
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43826 and previous config saved to /var/cache/conftool/dbconfig/20230208-145651-marostegui.json
  • 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43825 and previous config saved to /var/cache/conftool/dbconfig/20230208-145625-marostegui.json
  • 14:53 jforrester@deploy1002: superpes and jforrester: Backport for Add a wordmark to itwiktionary (T329168) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:51 jforrester@deploy1002: Started scap: Backport for Add a wordmark to itwiktionary (T329168)
  • 14:50 jforrester@deploy1002: Finished scap: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047) (duration: 07m 52s)
  • 14:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new test VM in drmrs - jmm@cumin2002"
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43824 and previous config saved to /var/cache/conftool/dbconfig/20230208-144437-marostegui.json
  • 14:44 jforrester@deploy1002: jforrester and superpes: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:42 jforrester@deploy1002: Started scap: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047)
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P43823 and previous config saved to /var/cache/conftool/dbconfig/20230208-144119-marostegui.json
  • 14:41 jforrester@deploy1002: Finished scap: Backport for Move non-variant wgMFUseWikibase to CommonSettings (duration: 07m 37s)
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43822 and previous config saved to /var/cache/conftool/dbconfig/20230208-143756-marostegui.json
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43821 and previous config saved to /var/cache/conftool/dbconfig/20230208-143735-marostegui.json
  • 14:35 jforrester@deploy1002: jforrester: Backport for Move non-variant wgMFUseWikibase to CommonSettings synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:33 jforrester@deploy1002: Started scap: Backport for Move non-variant wgMFUseWikibase to CommonSettings
  • 14:31 jforrester@deploy1002: Finished scap: Backport for Move non-variant wgMFNearby to CommonSettings (duration: 08m 52s)
  • 14:29 moritzm: test install of testvm6001 T327867
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P43820 and previous config saved to /var/cache/conftool/dbconfig/20230208-142613-marostegui.json
  • 14:24 jforrester@deploy1002: jforrester: Backport for Move non-variant wgMFNearby to CommonSettings synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:22 jforrester@deploy1002: Started scap: Backport for Move non-variant wgMFNearby to CommonSettings
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P43819 and previous config saved to /var/cache/conftool/dbconfig/20230208-142229-marostegui.json
  • 14:20 jforrester@deploy1002: Finished scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II (duration: 07m 48s)
  • 14:14 jforrester@deploy1002: jforrester: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:12 jforrester@deploy1002: Started scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II
  • 14:11 jforrester@deploy1002: Finished scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I (duration: 08m 12s)
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43818 and previous config saved to /var/cache/conftool/dbconfig/20230208-141106-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43817 and previous config saved to /var/cache/conftool/dbconfig/20230208-140859-marostegui.json
  • 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43816 and previous config saved to /var/cache/conftool/dbconfig/20230208-140837-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P43815 and previous config saved to /var/cache/conftool/dbconfig/20230208-140722-marostegui.json
  • 14:05 jforrester@deploy1002: jforrester: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:03 jforrester@deploy1002: Started scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P43813 and previous config saved to /var/cache/conftool/dbconfig/20230208-135331-marostegui.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43812 and previous config saved to /var/cache/conftool/dbconfig/20230208-135216-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43811 and previous config saved to /var/cache/conftool/dbconfig/20230208-134950-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P43810 and previous config saved to /var/cache/conftool/dbconfig/20230208-133825-marostegui.json
  • 13:33 jbond: (correction) send puppet.esams.wmnet to eqiad and puppet.esqin.wmnet to codfw
  • 13:33 jbond: send puppet.esams.wmnet to eqiad and puppet.esams.wmnet to codfw
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43809 and previous config saved to /var/cache/conftool/dbconfig/20230208-132318-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43808 and previous config saved to /var/cache/conftool/dbconfig/20230208-131409-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43807 and previous config saved to /var/cache/conftool/dbconfig/20230208-131348-marostegui.json
  • 13:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bullseye
  • 13:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
  • 12:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P43805 and previous config saved to /var/cache/conftool/dbconfig/20230208-125841-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P43804 and previous config saved to /var/cache/conftool/dbconfig/20230208-124335-marostegui.json
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1096.eqiad.wmnet
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1096.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 12:36 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1096.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 12:34 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1096.eqiad.wmnet
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43803 and previous config saved to /var/cache/conftool/dbconfig/20230208-122829-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43802 and previous config saved to /var/cache/conftool/dbconfig/20230208-122620-marostegui.json
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43801 and previous config saved to /var/cache/conftool/dbconfig/20230208-122559-marostegui.json
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 12:19 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 12:15 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-stretch2002.codfw.wmnet
  • 12:13 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-stretch2002.codfw.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43800 and previous config saved to /var/cache/conftool/dbconfig/20230208-121053-marostegui.json
  • 12:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bullseye
  • 11:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: moss-be1001.eqiad.wmnet
  • 11:56 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: moss-be1001.eqiad.wmnet
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43799 and previous config saved to /var/cache/conftool/dbconfig/20230208-115546-marostegui.json
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: flowspec1001.eqiad.wmnet
  • 11:53 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: flowspec1001.eqiad.wmnet
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43798 and previous config saved to /var/cache/conftool/dbconfig/20230208-114040-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43797 and previous config saved to /var/cache/conftool/dbconfig/20230208-113832-marostegui.json
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui: Stop mysql on db1096 (s5,s6) T329147
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43796 and previous config saved to /var/cache/conftool/dbconfig/20230208-110507-marostegui.json
  • 10:57 zabe@deploy1002: Finished scap: Backport for Remove cul_reason comment table migration code (T233004 T329151) (duration: 08m 05s)
  • 10:51 zabe@deploy1002: zabe: Backport for Remove cul_reason comment table migration code (T233004 T329151) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43793 and previous config saved to /var/cache/conftool/dbconfig/20230208-105001-marostegui.json
  • 10:49 zabe@deploy1002: Started scap: Backport for Remove cul_reason comment table migration code (T233004 T329151)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:35 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43791 and previous config saved to /var/cache/conftool/dbconfig/20230208-103455-marostegui.json
  • 10:33 volans: deploying python3-wmflib_1.2.1 to the fleet
  • 10:28 zabe@deploy1002: Finished scap: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366) (duration: 08m 49s)
  • 10:26 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015 T329073
  • 10:21 zabe@deploy1002: zabe: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:19 zabe@deploy1002: Started scap: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366)
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43790 and previous config saved to /var/cache/conftool/dbconfig/20230208-101948-marostegui.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43789 and previous config saved to /var/cache/conftool/dbconfig/20230208-101534-marostegui.json
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43788 and previous config saved to /var/cache/conftool/dbconfig/20230208-101512-marostegui.json
  • 10:08 phedenskog@deploy1002: Finished deploy [performance/navtiming@079891a]: (no justification provided) (duration: 00m 08s)
  • 10:08 phedenskog@deploy1002: Started deploy [performance/navtiming@079891a]: (no justification provided)
  • 10:07 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with invalid version
  • 10:07 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with invalid version
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P43787 and previous config saved to /var/cache/conftool/dbconfig/20230208-100006-marostegui.json
  • 09:59 moritzm: installing openssl security updates on bullseye
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1096 (s5,s6) from dbctl T329147', diff saved to https://phabricator.wikimedia.org/P43786 and previous config saved to /var/cache/conftool/dbconfig/20230208-095207-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P43785 and previous config saved to /var/cache/conftool/dbconfig/20230208-094500-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43783 and previous config saved to /var/cache/conftool/dbconfig/20230208-092954-marostegui.json
  • 09:14 godog: purge user_auth table on grafana1002 - T328784
  • 08:54 moritzm: installing imagemagick security updates
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43782 and previous config saved to /var/cache/conftool/dbconfig/20230208-082938-marostegui.json
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43781 and previous config saved to /var/cache/conftool/dbconfig/20230208-082916-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P43780 and previous config saved to /var/cache/conftool/dbconfig/20230208-081410-marostegui.json
  • 08:02 marostegui: dbmaint deploy schema change on s3 eqiad (with replication) T328807 T328828
  • 07:59 marostegui: dbmaint deploy schema change on s1 eqiad (with replication) T328807 T328828
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P43779 and previous config saved to /var/cache/conftool/dbconfig/20230208-075903-marostegui.json
  • 07:56 marostegui: dbmaint deploy schema change on s7 eqiad (with replication) T328807 T328828
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43778 and previous config saved to /var/cache/conftool/dbconfig/20230208-074357-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43777 and previous config saved to /var/cache/conftool/dbconfig/20230208-073837-marostegui.json
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43776 and previous config saved to /var/cache/conftool/dbconfig/20230208-073816-marostegui.json
  • 07:38 marostegui: dbmaint deploy schema change on s2 eqiad (with replication) T328807 T328828
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P43775 and previous config saved to /var/cache/conftool/dbconfig/20230208-072310-marostegui.json
  • 07:19 marostegui: dbmaint deploy schema change on s5 eqiad (with replication) T328807 T328828
  • 07:18 marostegui: dbmaint deploy schema change on s4 eqiad (with replication) T328807 T328828
  • 07:18 marostegui: dbmaint deploy schema change on s8 eqiad (with replication) T328807 T328828
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P43774 and previous config saved to /var/cache/conftool/dbconfig/20230208-070803-marostegui.json
  • 07:07 marostegui: Install 10.6.12 on pc2014 T329011
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43773 and previous config saved to /var/cache/conftool/dbconfig/20230208-065257-marostegui.json
  • 06:52 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2011 back to pc1 master (duration: 16m 01s)
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43772 and previous config saved to /var/cache/conftool/dbconfig/20230208-065149-marostegui.json
  • 06:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43771 and previous config saved to /var/cache/conftool/dbconfig/20230208-064405-root.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43770 and previous config saved to /var/cache/conftool/dbconfig/20230208-064134-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43769 and previous config saved to /var/cache/conftool/dbconfig/20230208-064027-marostegui.json
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc2011 back to pc1 master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 06:36 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2011 back to pc1 master
  • 03:58 AndyRussG: payments-wiki upgraded from 53d1a58d to 61ea310d, config revision changed from 5e707565 to 4d90c211
  • 02:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2420.codfw.wmnet with OS buster
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2435']
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2434']
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2435']
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2433']
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2434']
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2432']
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2433']
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2432']
  • 00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2431']
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2430']
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2431']
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2430']
  • 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2429']
  • 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2428']
  • 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2429']
  • 00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2427']
  • 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2428']
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2426']
  • 00:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2427']
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2426']
  • 00:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2424']
  • 00:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2425']

2023-02-07

  • 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2425']
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2424']
  • 23:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2423']
  • 23:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2422']
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2423']
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2422']
  • 23:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2421']
  • 23:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2420']
  • 23:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2421']
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2420']
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
  • 22:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
  • 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
  • 22:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
  • 22:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:10 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - T327970"
  • 22:08 urbanecm@deploy1002: Finished scap: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194) (duration: 08m 23s)
  • 22:07 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - T327970"
  • 22:02 urbanecm@deploy1002: urbanecm and superpes: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:00 urbanecm@deploy1002: Started scap: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194)
  • 21:59 urbanecm@deploy1002: Finished scap: Backport for Change the trwiki logo with a temporary one (old vector) (T329047) (duration: 10m 20s)
  • 21:51 urbanecm@deploy1002: superpes and urbanecm: Backport for Change the trwiki logo with a temporary one (old vector) (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:49 urbanecm@deploy1002: Started scap: Backport for Change the trwiki logo with a temporary one (old vector) (T329047)
  • 21:48 urbanecm@deploy1002: Finished scap: Backport for Install WikiLove extension on bnwikiquote (T328834) (duration: 15m 32s)
  • 21:35 urbanecm@deploy1002: superpes and urbanecm: Backport for Install WikiLove extension on bnwikiquote (T328834) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2051.codfw.wmnet with OS bullseye
  • 21:33 urbanecm: Create extension tables for Wikilove on bnwikiquote (T328834)
  • 21:33 urbanecm@deploy1002: Started scap: Backport for Install WikiLove extension on bnwikiquote (T328834)
  • 21:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:31 urbanecm@deploy1002: Finished scap: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717) (duration: 11m 10s)
  • 21:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1053.eqiad.wmnet with OS bullseye
  • 21:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:20 urbanecm@deploy1002: Started scap: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717)
  • 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
  • 21:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
  • 21:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
  • 21:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
  • 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:02 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventSreams - Fix android session schema path (duration: 07m 26s)
  • 21:01 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1053.eqiad.wmnet with OS bullseye
  • 20:58 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2051.codfw.wmnet with OS bullseye
  • 20:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2050.codfw.wmnet with OS bullseye
  • 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1051.eqiad.wmnet with OS bullseye
  • 20:44 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1005.eqiad.wmnet
  • 20:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
  • 20:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
  • 20:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
  • 20:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
  • 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2050.codfw.wmnet with OS bullseye
  • 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1051.eqiad.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:08 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1005.eqiad.wmnet on all recursors
  • 19:57 bking@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1005.eqiad.wmnet on all recursors
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
  • 19:56 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.22 refs T325585
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:53 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:53 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1005.eqiad.wmnet
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2049.codfw.wmnet with OS bullseye
  • 19:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1049.eqiad.wmnet with OS bullseye
  • 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
  • 19:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
  • 19:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1049.eqiad.wmnet with OS bullseye
  • 19:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2049.codfw.wmnet with OS bullseye
  • 19:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
  • 19:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
  • 18:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2048.codfw.wmnet with OS bullseye
  • 18:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1047.eqiad.wmnet with OS bullseye
  • 18:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
  • 18:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
  • 18:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
  • 18:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
  • 18:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2048.codfw.wmnet with OS bullseye
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1047.eqiad.wmnet with OS bullseye
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 13 hosts
  • 18:02 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 13 hosts
  • 17:55 inflatador: bking@cumin1001 repooling elastic and wdqs hosts post-maintenance T327925
  • 17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2047.codfw.wmnet with OS bullseye
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1046.eqiad.wmnet with OS bullseye
  • 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
  • 17:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
  • 17:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
  • 17:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
  • 17:22 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1046.eqiad.wmnet with OS bullseye
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2047.codfw.wmnet with OS bullseye
  • 16:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2046.codfw.wmnet with OS bullseye
  • 16:48 urbanecm@deploy1002: Finished scap: 58f4d877: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change (T308017), 854ff4ac: Finalize mediawiki/page/change schema at 1.0.0 (T308017) (duration: 07m 32s)
  • 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1045.eqiad.wmnet with OS bullseye
  • 16:41 urbanecm@deploy1002: Started scap: 58f4d877: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change (T308017), 854ff4ac: Finalize mediawiki/page/change schema at 1.0.0 (T308017)
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43765 and previous config saved to /var/cache/conftool/dbconfig/20230207-163902-root.json
  • 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
  • 16:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
  • 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
  • 16:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43764 and previous config saved to /var/cache/conftool/dbconfig/20230207-162357-root.json
  • 16:18 urbanecm@deploy1002: Finished scap: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064) (duration: 17m 44s)
  • 16:15 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2046.codfw.wmnet with OS bullseye
  • 16:14 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1045.eqiad.wmnet with OS bullseye
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43763 and previous config saved to /var/cache/conftool/dbconfig/20230207-160852-root.json
  • 16:02 urbanecm@deploy1002: urbanecm: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:00 urbanecm@deploy1002: Started scap: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064)
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43762 and previous config saved to /var/cache/conftool/dbconfig/20230207-155347-root.json
  • 15:53 moritzm: installing tiff security updates
  • 15:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2045.codfw.wmnet with OS bullseye
  • 15:47 urbanecm@deploy1002: Finished scap: 20a79c5: Don't add custom attributes in unwrapParsoidSections() (T328268) (duration: 07m 34s)
  • 15:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1043.eqiad.wmnet with OS bullseye
  • 15:39 urbanecm@deploy1002: Started scap: 20a79c5: Don't add custom attributes in unwrapParsoidSections() (T328268)
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43761 and previous config saved to /var/cache/conftool/dbconfig/20230207-153842-root.json
  • 15:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
  • 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
  • 15:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
  • 15:26 urbanecm@deploy1002: Finished scap: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456) (duration: 10m 39s)
  • 15:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43760 and previous config saved to /var/cache/conftool/dbconfig/20230207-152337-root.json
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 15:17 urbanecm@deploy1002: matmarex and urbanecm: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 15:15 urbanecm@deploy1002: Started scap: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456)
  • 15:14 volans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T327925
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1043.eqiad.wmnet with OS bullseye
  • 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2045.codfw.wmnet with OS bullseye
  • 15:12 vgutierrez: repool codfw edge site - T327925
  • 15:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 15:09 volans@cumin2002: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 15:09 volans@cumin2002: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T327925
  • 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 15:07 volans@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in codfw: T327925
  • 15:05 marostegui: dbmaint deploy schema change on s8 T328807 T328828
  • 15:04 vgutierrez: restart pybal in lvs2010 - T327925
  • 15:01 marostegui: dbmaint deploy schema change on s6 T328807
  • 15:00 vgutierrez: restart pybal in lvs2009 - T327925
  • 14:59 marostegui: dbmaint deploy schema change on s6 T328828
  • 14:53 moritzm: adding nfraison to pwstore T328915
  • 14:46 volans@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in codfw: T327925
  • 14:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:40 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
  • 14:36 claime: repooled appserver, api_appserver, jobrunner, parsoid - T327925
  • 14:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=jobrunner
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 14:32 Emperor: pool ms-fe2009 (codfw as a whole still depooled) T327925
  • 14:28 jbond: enable puppet in codfw, uslfo, esams post switch upgrade T327925
  • 14:26 claime: depooled appserver, api_appserver, jobrunner, parsoid - T327925
  • 14:25 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:21 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid
  • 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=appserver
  • 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=jobrunner
  • 14:18 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=api_appserver
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
  • 14:08 jbond: disable puppet in codfw, uslfo, esams for switch upgrade T327925
  • 14:07 lucaswerkmeister-wmde@deploy1002: backport aborted: (duration: 17m 46s)
  • 14:06 XioNoX: asw-a-codfw> request system reboot all-members - T327925
  • 13:59 XioNoX: disable puppet in ulsfo/esams/codfw for codfw row A switch upgrade - T327925
  • 13:56 Emperor: depool ms-fe2009 T327925
  • 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
  • 13:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
  • 13:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 199 hosts with reason: codfw row A upgrade
  • 13:32 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) depool all active/active services in codfw: T327925
  • 13:31 vgutierrez: depool codfw edge site - T327925
  • 13:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 199 hosts with reason: codfw row A upgrade
  • 13:13 jbond: enable puppet in codfw, ulsfo and esams to allow depools post switch upgrade T327925
  • 13:11 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route depool all active/active services in codfw: T327925
  • 13:05 jbond: diable puppet in codfw, ulsfo and esams for switch upgrade T327925
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm6001.drmrs.wmnet
  • 12:28 vgutierrez: depooling authdns2001 - T327925
  • 12:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on doh2001.wikimedia.org with reason: depooled; T327925
  • 12:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on doh2001.wikimedia.org with reason: depooled; T327925
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm6001.drmrs.wmnet on all recursors
  • 12:20 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache testvm6001.drmrs.wmnet on all recursors
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
  • 12:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
  • 12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm6001.drmrs.wmnet
  • 12:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1041.eqiad.wmnet with OS bullseye
  • 11:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2044.codfw.wmnet with OS bullseye
  • 11:56 marostegui: Install 10.4.28 on db1152 T329011
  • 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 11:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
  • 11:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
  • 11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
  • 11:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
  • 11:33 moritzm: installing imagemagick security updates on buster
  • 11:29 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1041.eqiad.wmnet with OS bullseye
  • 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2044.codfw.wmnet with OS bullseye
  • 10:51 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 10:19 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:19 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bullseye
  • 10:13 oblivian@cumin2002: END (FAIL) - Cookbook sre.discovery.datacenter-route (exit_code=93) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:12 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bullseye
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2002.wikimedia.org with OS bullseye
  • 09:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 09:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
  • 09:20 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 09:20 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 09:20 akosiaris: add wiktionary to mobile-sections rerenders. T226931
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
  • 09:19 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:19 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:08 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 09:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast2002.wikimedia.org with OS bullseye
  • 08:50 vgutierrez: rolling upgrade to HAProxy 2.4.21 in cp nodes
  • 08:48 kostajh: UTC morning deploys done
  • 08:48 kharlan@deploy1002: Finished scap: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501) (duration: 12m 48s)
  • 08:37 kharlan@deploy1002: urbanecm and kharlan: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:35 kharlan@deploy1002: Started scap: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501)
  • 08:30 moritzm: installing imagemagick security updates on Thumbor T328901
  • 08:28 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Disable leveling up features in production (T328757) (duration: 12m 11s)
  • 08:18 kharlan@deploy1002: kharlan: Backport for GrowthExperiments: Disable leveling up features in production (T328757) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:16 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Disable leveling up features in production (T328757)
  • 08:14 kharlan@deploy1002: backport aborted: (duration: 00m 07s)
  • 07:00 marostegui: Failover m3 from db1159 to db1164 - T328404
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2110 in API', diff saved to https://phabricator.wikimedia.org/P43758 and previous config saved to /var/cache/conftool/dbconfig/20230207-063147-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P43757 and previous config saved to /var/cache/conftool/dbconfig/20230207-062826-root.json
  • 04:58 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.20 (duration: 02m 20s)
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.22 refs T325585 (duration: 53m 11s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.22 refs T325585

2023-02-06

  • 23:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:55 ryankemper: T327925 Depooled codfw wdqs hosts: `ryankemper@cumin2002:~$ sudo -E cumin -b 3 'wdqs[2003-2004,2009]*' 'sudo depool'`
  • 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
  • 22:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
  • 22:48 ryankemper: T327925 Banned `elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]` on codfw elastic
  • 22:42 inflatador: bking@cumin2002 banning Elastic nodes from cluster in preparation for T327925
  • 22:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2421
  • 22:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2421
  • 22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2421 DNS - pt1979@cumin2002"
  • 22:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2421 DNS - pt1979@cumin2002"
  • 22:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 zabe@deploy1002: say aborted: (duration: 00m 39s)
  • 19:30 zabe@deploy1002: backport aborted: (duration: 00m 00s)
  • 19:29 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=metawiki --signup --ip 92.62.231.190 # T328929
  • 19:27 zabe@deploy1002: backport aborted: (duration: 00m 23s)
  • 19:25 urbanecm@deploy1002: Finished scap: Backport for Add a new throttle rule (T328929) (duration: 07m 43s)
  • 19:18 urbanecm@deploy1002: Started scap: Backport for Add a new throttle rule (T328929)
  • 19:17 urbanecm@deploy1002: backport aborted: (duration: 00m 01s)
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2420 DNS - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2420 DNS - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2420
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2420
  • 18:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:48 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:10 vgutierrez: rolling upgrade to HAProxy 2.4.21 in ulsfo cp nodes
  • 14:37 moritzm: installing imagemagick security updates on buster
  • 14:13 vgutierrez: testing HAProxy 2.4.21 in cp4052 and cp4044
  • 14:11 urbanecm@deploy1002: Finished scap: Backport for New config entries for migrated android schemas (T324167) (duration: 09m 19s)
  • 14:09 vgutierrez: fetch HAProxy 2.4.21 for buster and bullseye (apt.wm.o)
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43754 and previous config saved to /var/cache/conftool/dbconfig/20230206-140753-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43753 and previous config saved to /var/cache/conftool/dbconfig/20230206-140627-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43752 and previous config saved to /var/cache/conftool/dbconfig/20230206-140623-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43751 and previous config saved to /var/cache/conftool/dbconfig/20230206-140606-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43750 and previous config saved to /var/cache/conftool/dbconfig/20230206-140602-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43749 and previous config saved to /var/cache/conftool/dbconfig/20230206-140554-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43748 and previous config saved to /var/cache/conftool/dbconfig/20230206-140549-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43747 and previous config saved to /var/cache/conftool/dbconfig/20230206-140541-root.json
  • 14:05 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided) (duration: 00m 33s)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43746 and previous config saved to /var/cache/conftool/dbconfig/20230206-140501-root.json
  • 14:05 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided)
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43745 and previous config saved to /var/cache/conftool/dbconfig/20230206-140449-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43744 and previous config saved to /var/cache/conftool/dbconfig/20230206-140433-root.json
  • 14:04 urbanecm@deploy1002: urbanecm and sharvaniharan: Backport for New config entries for migrated android schemas (T324167) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43743 and previous config saved to /var/cache/conftool/dbconfig/20230206-140405-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43742 and previous config saved to /var/cache/conftool/dbconfig/20230206-140338-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43741 and previous config saved to /var/cache/conftool/dbconfig/20230206-140333-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43740 and previous config saved to /var/cache/conftool/dbconfig/20230206-140316-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43739 and previous config saved to /var/cache/conftool/dbconfig/20230206-140310-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43738 and previous config saved to /var/cache/conftool/dbconfig/20230206-140257-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43737 and previous config saved to /var/cache/conftool/dbconfig/20230206-140249-root.json
  • 14:02 urbanecm@deploy1002: Started scap: Backport for New config entries for migrated android schemas (T324167)
  • 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43736 and previous config saved to /var/cache/conftool/dbconfig/20230206-135248-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43735 and previous config saved to /var/cache/conftool/dbconfig/20230206-135122-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43734 and previous config saved to /var/cache/conftool/dbconfig/20230206-135118-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43733 and previous config saved to /var/cache/conftool/dbconfig/20230206-135101-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43732 and previous config saved to /var/cache/conftool/dbconfig/20230206-135057-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43731 and previous config saved to /var/cache/conftool/dbconfig/20230206-135049-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43730 and previous config saved to /var/cache/conftool/dbconfig/20230206-135044-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43729 and previous config saved to /var/cache/conftool/dbconfig/20230206-135036-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43728 and previous config saved to /var/cache/conftool/dbconfig/20230206-134956-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43727 and previous config saved to /var/cache/conftool/dbconfig/20230206-134944-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43726 and previous config saved to /var/cache/conftool/dbconfig/20230206-134928-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43725 and previous config saved to /var/cache/conftool/dbconfig/20230206-134901-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43724 and previous config saved to /var/cache/conftool/dbconfig/20230206-134833-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43723 and previous config saved to /var/cache/conftool/dbconfig/20230206-134828-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43722 and previous config saved to /var/cache/conftool/dbconfig/20230206-134811-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43721 and previous config saved to /var/cache/conftool/dbconfig/20230206-134805-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43720 and previous config saved to /var/cache/conftool/dbconfig/20230206-134752-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43719 and previous config saved to /var/cache/conftool/dbconfig/20230206-134744-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43718 and previous config saved to /var/cache/conftool/dbconfig/20230206-133743-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43717 and previous config saved to /var/cache/conftool/dbconfig/20230206-133618-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43716 and previous config saved to /var/cache/conftool/dbconfig/20230206-133613-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43715 and previous config saved to /var/cache/conftool/dbconfig/20230206-133556-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43714 and previous config saved to /var/cache/conftool/dbconfig/20230206-133552-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43713 and previous config saved to /var/cache/conftool/dbconfig/20230206-133544-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43712 and previous config saved to /var/cache/conftool/dbconfig/20230206-133540-root.json
  • 13:35 jbond: add confd to bookworm repos
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43711 and previous config saved to /var/cache/conftool/dbconfig/20230206-133531-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43710 and previous config saved to /var/cache/conftool/dbconfig/20230206-133451-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43709 and previous config saved to /var/cache/conftool/dbconfig/20230206-133439-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43708 and previous config saved to /var/cache/conftool/dbconfig/20230206-133423-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43707 and previous config saved to /var/cache/conftool/dbconfig/20230206-133356-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43706 and previous config saved to /var/cache/conftool/dbconfig/20230206-133329-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43705 and previous config saved to /var/cache/conftool/dbconfig/20230206-133323-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43704 and previous config saved to /var/cache/conftool/dbconfig/20230206-133306-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43703 and previous config saved to /var/cache/conftool/dbconfig/20230206-133300-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43702 and previous config saved to /var/cache/conftool/dbconfig/20230206-133247-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43701 and previous config saved to /var/cache/conftool/dbconfig/20230206-133239-root.json
  • 13:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:26 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43700 and previous config saved to /var/cache/conftool/dbconfig/20230206-132238-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43699 and previous config saved to /var/cache/conftool/dbconfig/20230206-132113-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43698 and previous config saved to /var/cache/conftool/dbconfig/20230206-132108-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43697 and previous config saved to /var/cache/conftool/dbconfig/20230206-132051-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43696 and previous config saved to /var/cache/conftool/dbconfig/20230206-132047-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43695 and previous config saved to /var/cache/conftool/dbconfig/20230206-132039-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43694 and previous config saved to /var/cache/conftool/dbconfig/20230206-132035-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43693 and previous config saved to /var/cache/conftool/dbconfig/20230206-132026-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43692 and previous config saved to /var/cache/conftool/dbconfig/20230206-131947-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43691 and previous config saved to /var/cache/conftool/dbconfig/20230206-131934-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43690 and previous config saved to /var/cache/conftool/dbconfig/20230206-131918-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43689 and previous config saved to /var/cache/conftool/dbconfig/20230206-131851-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43688 and previous config saved to /var/cache/conftool/dbconfig/20230206-131824-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43687 and previous config saved to /var/cache/conftool/dbconfig/20230206-131818-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43686 and previous config saved to /var/cache/conftool/dbconfig/20230206-131801-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43685 and previous config saved to /var/cache/conftool/dbconfig/20230206-131755-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43684 and previous config saved to /var/cache/conftool/dbconfig/20230206-131740-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43683 and previous config saved to /var/cache/conftool/dbconfig/20230206-131734-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43682 and previous config saved to /var/cache/conftool/dbconfig/20230206-130733-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43681 and previous config saved to /var/cache/conftool/dbconfig/20230206-130608-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43680 and previous config saved to /var/cache/conftool/dbconfig/20230206-130603-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43679 and previous config saved to /var/cache/conftool/dbconfig/20230206-130547-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43678 and previous config saved to /var/cache/conftool/dbconfig/20230206-130542-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43677 and previous config saved to /var/cache/conftool/dbconfig/20230206-130534-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43676 and previous config saved to /var/cache/conftool/dbconfig/20230206-130530-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43675 and previous config saved to /var/cache/conftool/dbconfig/20230206-130521-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43674 and previous config saved to /var/cache/conftool/dbconfig/20230206-130442-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43673 and previous config saved to /var/cache/conftool/dbconfig/20230206-130429-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43672 and previous config saved to /var/cache/conftool/dbconfig/20230206-130414-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43671 and previous config saved to /var/cache/conftool/dbconfig/20230206-130346-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43670 and previous config saved to /var/cache/conftool/dbconfig/20230206-130319-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43669 and previous config saved to /var/cache/conftool/dbconfig/20230206-130313-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43668 and previous config saved to /var/cache/conftool/dbconfig/20230206-130256-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43667 and previous config saved to /var/cache/conftool/dbconfig/20230206-130250-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43666 and previous config saved to /var/cache/conftool/dbconfig/20230206-130235-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43665 and previous config saved to /var/cache/conftool/dbconfig/20230206-130230-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43664 and previous config saved to /var/cache/conftool/dbconfig/20230206-125228-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43663 and previous config saved to /var/cache/conftool/dbconfig/20230206-125103-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43662 and previous config saved to /var/cache/conftool/dbconfig/20230206-125059-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43661 and previous config saved to /var/cache/conftool/dbconfig/20230206-125042-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43660 and previous config saved to /var/cache/conftool/dbconfig/20230206-125037-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43659 and previous config saved to /var/cache/conftool/dbconfig/20230206-125029-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43658 and previous config saved to /var/cache/conftool/dbconfig/20230206-125025-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43657 and previous config saved to /var/cache/conftool/dbconfig/20230206-125017-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43656 and previous config saved to /var/cache/conftool/dbconfig/20230206-124937-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43655 and previous config saved to /var/cache/conftool/dbconfig/20230206-124924-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43654 and previous config saved to /var/cache/conftool/dbconfig/20230206-124909-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43653 and previous config saved to /var/cache/conftool/dbconfig/20230206-124841-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43652 and previous config saved to /var/cache/conftool/dbconfig/20230206-124814-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43651 and previous config saved to /var/cache/conftool/dbconfig/20230206-124808-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43650 and previous config saved to /var/cache/conftool/dbconfig/20230206-124751-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43649 and previous config saved to /var/cache/conftool/dbconfig/20230206-124745-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43648 and previous config saved to /var/cache/conftool/dbconfig/20230206-124730-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43647 and previous config saved to /var/cache/conftool/dbconfig/20230206-124725-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43646 and previous config saved to /var/cache/conftool/dbconfig/20230206-124629-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43645 and previous config saved to /var/cache/conftool/dbconfig/20230206-124617-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43644 and previous config saved to /var/cache/conftool/dbconfig/20230206-124513-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43643 and previous config saved to /var/cache/conftool/dbconfig/20230206-124506-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43642 and previous config saved to /var/cache/conftool/dbconfig/20230206-123124-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43641 and previous config saved to /var/cache/conftool/dbconfig/20230206-123112-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43640 and previous config saved to /var/cache/conftool/dbconfig/20230206-123007-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43639 and previous config saved to /var/cache/conftool/dbconfig/20230206-123001-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43638 and previous config saved to /var/cache/conftool/dbconfig/20230206-121619-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43637 and previous config saved to /var/cache/conftool/dbconfig/20230206-121608-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43636 and previous config saved to /var/cache/conftool/dbconfig/20230206-121503-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43635 and previous config saved to /var/cache/conftool/dbconfig/20230206-121456-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43634 and previous config saved to /var/cache/conftool/dbconfig/20230206-120114-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43633 and previous config saved to /var/cache/conftool/dbconfig/20230206-120103-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43631 and previous config saved to /var/cache/conftool/dbconfig/20230206-115958-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43630 and previous config saved to /var/cache/conftool/dbconfig/20230206-115951-root.json
  • 11:58 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1108.eqiad.wmnet
  • 11:47 jbond: puppetmaster[12]002 reintroduced to services
  • 11:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host db1108.eqiad.wmnet
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43629 and previous config saved to /var/cache/conftool/dbconfig/20230206-114609-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43628 and previous config saved to /var/cache/conftool/dbconfig/20230206-114558-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43627 and previous config saved to /var/cache/conftool/dbconfig/20230206-114453-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43626 and previous config saved to /var/cache/conftool/dbconfig/20230206-114446-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43625 and previous config saved to /var/cache/conftool/dbconfig/20230206-113104-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43624 and previous config saved to /var/cache/conftool/dbconfig/20230206-113053-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43623 and previous config saved to /var/cache/conftool/dbconfig/20230206-112948-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43622 and previous config saved to /var/cache/conftool/dbconfig/20230206-112942-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43621 and previous config saved to /var/cache/conftool/dbconfig/20230206-112900-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43620 and previous config saved to /var/cache/conftool/dbconfig/20230206-112856-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43619 and previous config saved to /var/cache/conftool/dbconfig/20230206-112839-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43618 and previous config saved to /var/cache/conftool/dbconfig/20230206-112832-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43617 and previous config saved to /var/cache/conftool/dbconfig/20230206-112825-root.json
  • 11:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetmaster2002.codfw.wmnet,puppetmaster1002.eqiad.wmnet with reason: Decom
  • 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on puppetmaster2002.codfw.wmnet,puppetmaster1002.eqiad.wmnet with reason: Decom
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43616 and previous config saved to /var/cache/conftool/dbconfig/20230206-111356-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43615 and previous config saved to /var/cache/conftool/dbconfig/20230206-111351-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43614 and previous config saved to /var/cache/conftool/dbconfig/20230206-111334-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43613 and previous config saved to /var/cache/conftool/dbconfig/20230206-111327-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43612 and previous config saved to /var/cache/conftool/dbconfig/20230206-111320-root.json
  • 11:03 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:03 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:03 akosiaris: deploy changeprop 0.10.19, adding wikivoyage to list of domains the mobile-sections get rerendered for. T226931
  • 11:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:02 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:01 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:01 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:59 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:58 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:58 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43610 and previous config saved to /var/cache/conftool/dbconfig/20230206-105851-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43609 and previous config saved to /var/cache/conftool/dbconfig/20230206-105846-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43608 and previous config saved to /var/cache/conftool/dbconfig/20230206-105829-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43607 and previous config saved to /var/cache/conftool/dbconfig/20230206-105822-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43606 and previous config saved to /var/cache/conftool/dbconfig/20230206-105815-root.json
  • 10:56 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43605 and previous config saved to /var/cache/conftool/dbconfig/20230206-104346-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43604 and previous config saved to /var/cache/conftool/dbconfig/20230206-104341-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43603 and previous config saved to /var/cache/conftool/dbconfig/20230206-104324-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43602 and previous config saved to /var/cache/conftool/dbconfig/20230206-104317-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43601 and previous config saved to /var/cache/conftool/dbconfig/20230206-104310-root.json
  • 10:36 marostegui: Upgrade db1115 (db_inventory master) to 10.6. T328408
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43600 and previous config saved to /var/cache/conftool/dbconfig/20230206-102841-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43599 and previous config saved to /var/cache/conftool/dbconfig/20230206-102837-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43598 and previous config saved to /var/cache/conftool/dbconfig/20230206-102820-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43597 and previous config saved to /var/cache/conftool/dbconfig/20230206-102812-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43596 and previous config saved to /var/cache/conftool/dbconfig/20230206-102806-root.json
  • 10:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 10:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 10:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43595 and previous config saved to /var/cache/conftool/dbconfig/20230206-101336-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43594 and previous config saved to /var/cache/conftool/dbconfig/20230206-101332-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43593 and previous config saved to /var/cache/conftool/dbconfig/20230206-101315-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43592 and previous config saved to /var/cache/conftool/dbconfig/20230206-101308-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43591 and previous config saved to /var/cache/conftool/dbconfig/20230206-101301-root.json
  • 10:10 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided) (duration: 00m 38s)
  • 10:09 hashar@deploy1002: Started deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided)
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:05 urbanecm@deploy1002: Finished scap: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739) (duration: 18m 56s)
  • 09:05 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:04 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:56 urbanecm@deploy1002: wmde-fisch and urbanecm: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:46 urbanecm@deploy1002: Started scap: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739)
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2094 db2097 db2103 db2104 db2105 db2106 db2121 db2122 db2132 db2133 db2136 db2142 db2145 db2146 db2153 db2154 db2155 db2156 db2157 db2158 db2175 db2176 db2183 T327925', diff saved to https://phabricator.wikimedia.org/P43587 and previous config saved to /var/cache/conftool/dbconfig/20230206-073015-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020 es2024 es2026 es2027 es2028 T327925', diff saved to https://phabricator.wikimedia.org/P43586 and previous config saved to /var/cache/conftool/dbconfig/20230206-071913-root.json
  • 07:17 hashar: Restarted Gerrit for deployment
  • 07:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json (duration: 00m 05s)
  • 07:14 hashar@deploy1002: Started deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json
  • 07:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json | T328134 (duration: 00m 10s)
  • 07:06 hashar@deploy1002: Started deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json | T328134

2023-02-05

  • 22:28 topranks: Re-enabling peering to Seabone/Telecom Italit AS 6762 on cr2-esams at AMS-IX
  • 14:39 cdanis: silenced NELHigh alert for 20 hours: Telecom Italy issues; alertmanager silence id 3fb3b999-9756-44af-a1e8-fd1faae8b9bf
  • 11:49 topranks: Manually deactivating peering to Telecom Italia / Seabone at AMS-IX on cr2-esams as they are having issues

2023-02-03

  • 21:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 21:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 21:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:44 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS bullseye
  • 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
  • 18:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
  • 18:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
  • 18:49 topranks: Enabling 4x10G channelization for pic 0 QSFP 4 on cr1-codfw
  • 18:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
  • 18:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS bullseye
  • 18:23 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 18:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1088.eqiad.wmnet with OS bullseye
  • 17:57 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 17:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 17:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet
  • 17:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS bullseye
  • 17:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1088.eqiad.wmnet with OS bullseye
  • 17:34 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1086.eqiad.wmnet
  • 17:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS bullseye
  • 17:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
  • 17:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
  • 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS bullseye
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS bullseye
  • 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 16:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 16:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2012.codfw.wmnet
  • 16:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2012.codfw.wmnet
  • 15:51 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): test (duration: 00m 26s)
  • 15:51 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): test
  • 15:23 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection (duration: 00m 15s)
  • 15:22 milimetric@deploy1002: Started deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection
  • 14:31 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 09s)
  • 14:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 14:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2011.codfw.wmnet
  • 14:19 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 23s)
  • 14:18 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 14:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2011.codfw.wmnet
  • 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=ats-be
  • 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=cdn
  • 13:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS bullseye
  • 13:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS bullseye
  • 12:09 moritzm: installing node-moment security updates
  • 12:01 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 13s)
  • 12:00 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2010.codfw.wmnet
  • 11:58 moritzm: installing node-qs security updates
  • 11:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2010.codfw.wmnet
  • 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2009.codfw.wmnet
  • 11:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2009.codfw.wmnet
  • 10:44 moritzm: updating perf on buster hosts
  • 10:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:11 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2008.codfw.wmnet
  • 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:06 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2008.codfw.wmnet
  • 09:51 moritzm: installing ruby-rack security updates
  • 09:31 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:31 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:24 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:23 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:23 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:19 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
  • 09:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1001.eqiad.wmnet
  • 09:13 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:07 moritzm: installing modsecurity-crs security updates
  • 09:02 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:02 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
  • 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1084.eqiad.wmnet
  • 05:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS bullseye
  • 05:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1085.eqiad.wmnet with OS bullseye
  • 04:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS bullseye
  • 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS bullseye
  • 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1083.eqiad.wmnet
  • 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet
  • 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS bullseye
  • 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS bullseye
  • 03:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 03:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS bullseye
  • 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS bullseye
  • 03:20 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1080.eqiad.wmnet
  • 03:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS bullseye
  • 02:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 02:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=ats-be
  • 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=cdn
  • 02:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS bullseye
  • 02:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
  • 02:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 02:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 01:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS bullseye
  • 01:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
  • 00:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye

2023-02-02

  • 22:58 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
  • 22:15 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1079.eqiad.wmnet
  • 22:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS bullseye
  • 22:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
  • 22:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1078.eqiad.wmnet
  • 21:58 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment everywhere (T233004) (duration: 07m 58s)
  • 21:52 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment everywhere (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:50 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment everywhere (T233004)
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS bullseye
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:30 brennen: end of utc late backport & config window
  • 21:30 brennen@deploy1002: Finished scap: Backport for Enable client preferences everywhere (T327979) (duration: 11m 14s)
  • 21:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS bullseye
  • 21:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 21:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS bullseye
  • 21:21 brennen@deploy1002: brennen and nray: Backport for Enable client preferences everywhere (T327979) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:19 brennen@deploy1002: Started scap: Backport for Enable client preferences everywhere (T327979)
  • 21:18 brennen@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004) (duration: 12m 02s)
  • 21:07 brennen@deploy1002: brennen and dreamyjazz: Backport for Disable write old for CheckUserLog reason everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004)
  • 20:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1078.eqiad.wmnet with OS bullseye
  • 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS bullseye
  • 20:23 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.3-1+deb11u1_amd64.changes # T328280
  • 20:21 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.3-1_amd64.changes # T328280
  • 20:11 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) (duration: 09m 39s)
  • 20:03 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
  • 20:01 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004)
  • 19:55 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 19:54 ryankemper: T328674 [Elastic] With puppet disabled on elastic* fleet, `ryankemper@elastic2037:~$ sudo run-puppet-agent --force` to verify changes in https://gerrit.wikimedia.org/r/886055
  • 19:30 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.21 refs T325584
  • 19:28 zabe@deploy1002: say aborted: (duration: 00m 03s)
  • 18:42 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004) (duration: 08m 19s)
  • 18:36 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:34 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004)
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2043.codfw.wmnet with OS bullseye
  • 18:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bullseye
  • 17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:33 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2043.codfw.wmnet with OS bullseye
  • 17:32 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bullseye
  • 17:29 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 17:12 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 17:12 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:53 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:52 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:51 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installation of scap version "4.34.0" completed for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installing scap version "4.34.0" for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:49 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:47 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2007.codfw.wmnet
  • 16:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2007.codfw.wmnet
  • 16:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:15 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:10 volans: uploaded python3-wmflib_1.2.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:40 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided) (duration: 07m 01s)
  • 15:38 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:37 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:34 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:33 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided)
  • 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti3004
  • 15:17 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti3004
  • 15:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2006.codfw.wmnet
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:00 vgutierrez: rolling restart of varnish in cache::text - T315676
  • 14:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2006.codfw.wmnet
  • 14:55 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:39 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2005.codfw.wmnet
  • 14:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:25 moritzm: installing containerd security updates on codfw k8s nodes
  • 14:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2005.codfw.wmnet
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
  • 13:10 kharlan:: Deployed security patch for T328643
  • 13:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS bullseye
  • 13:04 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 kharlan:: Deployed security patch for T328643
  • 13:02 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2004.codfw.wmnet
  • 13:00 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2004.codfw.wmnet
  • 12:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:47 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:29 btullis@deploy1002: Finished deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade (duration: 00m 42s)
  • 12:29 claime: Work ongoing on m2 and m3
  • 12:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2003.codfw.wmnet
  • 12:29 btullis@deploy1002: Started deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade
  • 12:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS bullseye
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2003.codfw.wmnet
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:46 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:40 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:38 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes-4.out # T328634 – made some progress then errored out again
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-3.out # T328634 – seemed to finish the first 20 pages and then go into an infinite loop, I Ctrl+Ced it
  • 11:28 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-2.out # T328634 – another error but made more progress
  • 11:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes.out # T328634 – failed quickly, details in task
  • 11:22 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 11:22 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2002.codfw.wmnet
  • 10:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2002.codfw.wmnet
  • 10:17 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:11 moritzm: restarting FPM on mw canaries to pick up tiff security updates
  • 10:04 moritzm: installing tiff security updates
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2001.codfw.wmnet
  • 09:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2001.codfw.wmnet
  • 09:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 09:40 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 398143
  • 09:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398143
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 09:13 apergos: UTC morning backport and config training window done
  • 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 09:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 09:11 elukey: roll restart of eventgate-main pods in wikikube eqiad/codfw to pick up new stream configs - T328576
  • 08:57 ariel@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) (duration: 10m 56s)
  • 08:48 ariel@deploy1002: ariel and aishik: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:46 ariel@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)
  • 08:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 08:37 tgr@deploy1002: Finished scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) (duration: 14m 26s)
  • 08:27 tgr@deploy1002: tgr: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:23 tgr@deploy1002: Started scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)
  • 06:17 kart_: Updated cxserver to 2023-02-02-004918-production (T129470, T172035, T327842)
  • 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
  • 03:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
  • 03:21 ejegg: payments-wiki upgraded from f20a2208 to 53d1a58d
  • 02:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 02:14 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
  • 01:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 01:55 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 01:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 01:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS bullseye
  • 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS bullseye
  • 00:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
  • 00:06 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
  • 00:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye

2023-02-01

  • 23:45 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) (duration: 08m 07s)
  • 23:39 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 23:37 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)
  • 23:31 rzl@cumin2002: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P43574 and previous config saved to /var/cache/conftool/dbconfig/20230201-233140-rzl.json
  • 23:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 23:17 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 06m 57s)
  • 23:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 23:01 zabe@deploy1002: Finished scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) (duration: 07m 45s)
  • 22:55 zabe@deploy1002: zabe: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 22:53 zabe@deploy1002: Started scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)
  • 22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) (duration: 13m 03s)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 22:40 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5022.eqsin.wmnet with OS bullseye
  • 22:38 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:36 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004)
  • 22:32 kindrobot: close UTC late backport window
  • 22:31 kindrobot@deploy1002: Finished scap: Backport for Enable client preferences for group1 (T327979) (duration: 10m 37s)
  • 22:22 kindrobot@deploy1002: nray and kindrobot: Backport for Enable client preferences for group1 (T327979) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:21 kindrobot@deploy1002: Started scap: Backport for Enable client preferences for group1 (T327979)
  • 22:14 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) (duration: 18m 14s)
  • 21:57 kindrobot@deploy1002: kindrobot and sbailey: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:56 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612)
  • 21:53 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:52 kindrobot@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) (duration: 14m 53s)
  • 21:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 21:39 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:39 kindrobot@deploy1002: dreamyjazz and kindrobot: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:37 kindrobot@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004)
  • 21:32 kindrobot@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) (duration: 13m 56s)
  • 21:26 eevans@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:24 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:20 kindrobot@deploy1002: arlolra and kindrobot: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:18 kindrobot@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)
  • 21:14 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS bullseye
  • 21:03 kindrobot: start UTC late backport deployment window
  • 21:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 20:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:44 eevans@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 20:43 urandom: depooling sessionstore —codfw— in preparation for Cassandra restarts — T327675
  • 20:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 20:38 eevans@puppetmaster1001: conftool action : get/pooled; selector: dnsdisc=$SERVICE,name=$DC
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS bullseye
  • 20:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS bullseye
  • 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS bullseye
  • 20:08 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=ats-be
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=cdn
  • 20:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5031.eqsin.wmnet with OS bullseye
  • 19:53 dancy: The train is blocked on T328601
  • 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS bullseye
  • 19:49 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325584 (duration: 06m 36s)
  • 19:49 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS bullseye
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325584
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=cdn
  • 19:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 19:33 dancy@deploy1002: deploy-promote aborted: (duration: 11m 58s)
  • 19:33 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 03m 38s)
  • 19:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 19:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS bullseye
  • 19:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS bullseye
  • 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 19:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS bullseye
  • 19:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3060.esams.wmnet
  • 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS bullseye
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:55 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:55 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS bullseye
  • 18:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3059.esams.wmnet
  • 18:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS bullseye
  • 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:29 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=cdn
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
  • 18:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS bullseye
  • 18:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3058.esams.wmnet
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS bullseye
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5030.eqsin.wmnet with OS bullseye
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43573 and previous config saved to /var/cache/conftool/dbconfig/20230201-181036-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43572 and previous config saved to /var/cache/conftool/dbconfig/20230201-181031-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43571 and previous config saved to /var/cache/conftool/dbconfig/20230201-181024-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43570 and previous config saved to /var/cache/conftool/dbconfig/20230201-181016-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43569 and previous config saved to /var/cache/conftool/dbconfig/20230201-181011-root.json
  • 18:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 18:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43568 and previous config saved to /var/cache/conftool/dbconfig/20230201-175531-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43567 and previous config saved to /var/cache/conftool/dbconfig/20230201-175526-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43566 and previous config saved to /var/cache/conftool/dbconfig/20230201-175519-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43565 and previous config saved to /var/cache/conftool/dbconfig/20230201-175511-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43564 and previous config saved to /var/cache/conftool/dbconfig/20230201-175506-root.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43563 and previous config saved to /var/cache/conftool/dbconfig/20230201-175446-root.json
  • 17:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS bullseye
  • 17:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3057.esams.wmnet
  • 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS bullseye
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43562 and previous config saved to /var/cache/conftool/dbconfig/20230201-174026-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43561 and previous config saved to /var/cache/conftool/dbconfig/20230201-174021-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43560 and previous config saved to /var/cache/conftool/dbconfig/20230201-174015-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43559 and previous config saved to /var/cache/conftool/dbconfig/20230201-174007-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43558 and previous config saved to /var/cache/conftool/dbconfig/20230201-174001-root.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43557 and previous config saved to /var/cache/conftool/dbconfig/20230201-173941-root.json
  • 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43555 and previous config saved to /var/cache/conftool/dbconfig/20230201-172521-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43554 and previous config saved to /var/cache/conftool/dbconfig/20230201-172516-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43553 and previous config saved to /var/cache/conftool/dbconfig/20230201-172510-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43552 and previous config saved to /var/cache/conftool/dbconfig/20230201-172502-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43551 and previous config saved to /var/cache/conftool/dbconfig/20230201-172456-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43550 and previous config saved to /var/cache/conftool/dbconfig/20230201-172436-root.json
  • 17:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS bullseye
  • 17:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS bullseye
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
  • 17:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43549 and previous config saved to /var/cache/conftool/dbconfig/20230201-171016-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43548 and previous config saved to /var/cache/conftool/dbconfig/20230201-171011-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43547 and previous config saved to /var/cache/conftool/dbconfig/20230201-171005-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43546 and previous config saved to /var/cache/conftool/dbconfig/20230201-170957-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43545 and previous config saved to /var/cache/conftool/dbconfig/20230201-170951-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43544 and previous config saved to /var/cache/conftool/dbconfig/20230201-170931-root.json
  • 16:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43543 and previous config saved to /var/cache/conftool/dbconfig/20230201-165512-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43542 and previous config saved to /var/cache/conftool/dbconfig/20230201-165506-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43541 and previous config saved to /var/cache/conftool/dbconfig/20230201-165500-root.json
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43540 and previous config saved to /var/cache/conftool/dbconfig/20230201-165452-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43539 and previous config saved to /var/cache/conftool/dbconfig/20230201-165446-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS bullseye
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43538 and previous config saved to /var/cache/conftool/dbconfig/20230201-165426-root.json
  • 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43536 and previous config saved to /var/cache/conftool/dbconfig/20230201-164007-root.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43535 and previous config saved to /var/cache/conftool/dbconfig/20230201-164002-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43534 and previous config saved to /var/cache/conftool/dbconfig/20230201-163955-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43533 and previous config saved to /var/cache/conftool/dbconfig/20230201-163947-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43532 and previous config saved to /var/cache/conftool/dbconfig/20230201-163941-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43531 and previous config saved to /var/cache/conftool/dbconfig/20230201-163921-root.json
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS bullseye
  • 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:25 jynus: reloaded apache on mailman
  • 16:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 15:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 14:56 sukhe: cp1075.eqiad.wmnet for idrac firmware upgrade testing
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 14:52 awight: EU deployment window complete
  • 14:48 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 awight@deploy1002: Finished scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) (duration: 08m 25s)
  • 14:47 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:41 awight@deploy1002: elukey and awight: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 db2158 db2157 es2026 db2106 db2146 T327404', diff saved to https://phabricator.wikimedia.org/P43530 and previous config saved to /var/cache/conftool/dbconfig/20230201-144152-root.json
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:40 awight@deploy1002: Started scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768)
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:37 awight@deploy1002: Finished scap: Backport for Add cswiki to desktop-improvements group. (T328154) (duration: 09m 22s)
  • 14:29 awight@deploy1002: jdrewniak and awight: Backport for Add cswiki to desktop-improvements group. (T328154) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:28 awight@deploy1002: Started scap: Backport for Add cswiki to desktop-improvements group. (T328154)
  • 14:26 awight@deploy1002: Finished scap: Backport for Squashed diff to catch up to master (duration: 09m 07s)
  • 14:19 awight@deploy1002: awight and mlitn: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:17 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:11 awight@deploy1002: backport aborted: (duration: 06m 09s)
  • 14:11 awight@deploy1002: sync-world aborted: Backport for Squashed diff to catch up to master (duration: 03m 36s)
  • 14:09 awight@deploy1002: mlitn and awight: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:07 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3005.wikimedia.org
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:06 moritzm: updating perf on Bullseye hosts
  • 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3005.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5002.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:47 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5002.wikimedia.org
  • 13:21 moritzm: installing curl security updates on bullseye
  • 13:00 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:59 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2003.codfw.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
  • 12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 12:15 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part III (T308932) (duration: 06m 43s)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:22 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP (duration: 01m 45s)
  • 11:21 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part II (T308932) (duration: 07m 04s)
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 11:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP (duration: 00m 51s)
  • 11:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP
  • 11:14 ladsgroup@deploy1002: Synchronized wmf-config/ext-CirrusSearch.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part I (T308932) (duration: 07m 04s)
  • 11:01 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0] (duration: 01m 18s)
  • 11:00 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0]
  • 10:59 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0] (duration: 00m 05s)
  • 10:59 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0]
  • 10:58 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0] (duration: 04m 29s)
  • 10:54 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0]
  • 10:52 steve_munene: Deploying refinery for ops week
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:42 zabe: start running migrateRevisionCommentTemp in remaining sections (for now except s3) in screens # T275246
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:01 godog: upgrade grafana to 8.5.20 on cloudmetrics* - T328405
  • 09:57 godog: upgrade grafana to 8.5.20 on grafana1002 - T328405
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
  • 09:47 godog: upgrade grafana to 8.5.20 on grafana2001 - T328405
  • 09:15 urbanecm: Clean sign up throttle for IP 195.113.145.2 (via resetAuthenticationThrottle.php; T328521)
  • 09:14 urbanecm@deploy1002: Finished scap: Backport for Add new throttle rule (T328521) (duration: 07m 24s)
  • 09:07 urbanecm@deploy1002: Started scap: Backport for Add new throttle rule (T328521)
  • 09:06 urbanecm@deploy1002: backport aborted: (duration: 00m 01s)
  • 09:05 ladsgroup@deploy1002: Finished scap: Backport for Create additional namespaces on shn.wikibooks (T327850) (duration: 15m 06s)
  • 08:54 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 08:54 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:52 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Create additional namespaces on shn.wikibooks (T327850) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 08:50 ladsgroup@deploy1002: Started scap: Backport for Create additional namespaces on shn.wikibooks (T327850)
  • 08:49 ladsgroup@deploy1002: Finished scap: Backport for Add a wordmark to trwiktionary (T328499) (duration: 08m 05s)
  • 08:45 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=k8s-ingress-staging
  • 08:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=k8s-ingress-staging
  • 08:42 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add a wordmark to trwiktionary (T328499) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:41 ladsgroup@deploy1002: Started scap: Backport for Add a wordmark to trwiktionary (T328499)
  • 08:40 ladsgroup@deploy1002: Finished scap: Backport for Add mobile wordmark to cswiktionary (T328357) (duration: 12m 26s)
  • 08:29 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add mobile wordmark to cswiktionary (T328357) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:27 ladsgroup@deploy1002: Started scap: Backport for Add mobile wordmark to cswiktionary (T328357)
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:27 ladsgroup@deploy1002: Finished scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) (duration: 09m 42s)
  • 08:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 08:19 ladsgroup@deploy1002: ladsgroup and krinkle: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:17 ladsgroup@deploy1002: Started scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)
  • 08:14 ladsgroup@deploy1002: Finished scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) (duration: 10m 15s)
  • 08:06 ladsgroup@deploy1002: phedenskog and ladsgroup: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:05 moritzm: installing libarchive security updates
  • 08:04 ladsgroup@deploy1002: Started scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700)
  • 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55821
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55821
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43524 and previous config saved to /var/cache/conftool/dbconfig/20230201-073348-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43523 and previous config saved to /var/cache/conftool/dbconfig/20230201-071841-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43522 and previous config saved to /var/cache/conftool/dbconfig/20230201-070335-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43521 and previous config saved to /var/cache/conftool/dbconfig/20230201-064828-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43520 and previous config saved to /var/cache/conftool/dbconfig/20230201-064311-ladsgroup.json
  • 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:38 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3055.esams.wmnet
  • 00:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS bullseye
  • 00:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS bullseye

Other archives

2000s

2010s

2020s